Files
d3-spring-model/README.md
2018-02-06 13:40:05 +00:00

229 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# d3-spring-model
This module implements three force-directed layout algorithms to visualize high-dimensional data in 2D space.
1. Basic spring model algorithm. In this model, every data point (node) pairs are connected with a spring that pushes or pulls, depending on the difference between 2D and high-dimensional distance. This is a tweaked version of [D3's force link](https://github.com/d3/d3-force#forceLink) with functionalities removed to improve performance and lower the memory usage.
1. Neighbour and Sampling algorithm. It uses stochastic sampling to find the best neighbours for high-dimensional data and creates the layout in 2 dimensions.
1. Hybrid layout algorithm. It performs Neighbour and Sampling algorithm on a subset of data before interpolating the rest onto the 2D space. Neighbour and Sampling algorithm may also be run over the full dataset at the end to refine placement.
During the interpolation, each node have to find a parent, a closest node that has already been plotted on the 2D space. Two methods of of finding the parents have been implemented.
1. Bruteforce searching. This method takes more time but guaranteed that the parent found is the best one.
1. Pivot-based searching. This method introduce a one-off pre-processing time but will make parent finding of each node faster. The parent found may not be the best one but should still be near enough to provide good results.
These algorithms are useful for producing visualizations that show relationships between the data. For instance:
![Iris data set](img/IrisLink.png)
![Part of Poker Hands data set](img/Poker3000Link.png)
### Authors
Pitchaya Boonsarngsuk
Based on [d3-neighbour-sampling](https://github.com/sReeper/d3-neighbour-sampling) by Remigijus Bartasius and Matthew Chalmers under MIT license.
Based on [d3-force](https://github.com/d3/d3-force) by Mike Bostock under BSD 3-Clause license.
### Reference
- Chalmers, M. ["A linear iteration time layout algorithm for visualising high-dimensional data."](http://dl.acm.org/citation.cfm?id=245035) Proceedings of the 7th conference on Visualization'96. IEEE Computer Society Press, 1996.
- Morrison, A., Ross, G. & Chalmers, M. ["A Hybrid Layout Algorithm for Sub-Quadratic Multidimensional Scaling."](https://dl.acm.org/citation.cfm?id=857191.857738) INFOVIS '02 Proceedings of the IEEE Symposium on Information Visualization, 2002
- Morrison, A. & Chalmers, M. ["Improving hybrid MDS with pivot-based searching."](https://dl.acm.org/citation.cfm?id=1947387) INFOVIS'03 Proceedings of the Ninth annual IEEE conference on Information visualization, 2003
## Usage
Download the [latest release](https://git.win32exe.tech/brian/d3-spring-model/releases) and load either the full and minified version alongside [D3 4.0](https://github.com/d3/d3).
```html
<script src="https://d3js.org/d3.v4.min.js"></script>
<script src="d3-spring-model.min.js"></script>
<script>
var simulation = d3.forceSimulation(nodes);
</script>
```
## File structure
- [index.js](index.js) Export list of the module
- [src/](src) Source code of the module
- [package.json](package.json) Node.js moudle descriptor with build scripts
- [img](img) Images for this readme file
- [examples](examples) An example page running all the algorithms implemented
## Building
```bash
npm run build # Clean build folder and build the module into a single js file.
npm run minify # Minify the built js file.
npm run zip # Zip built files and documents for release.
```
See [package.json](package.json) for more details.
## API Reference
### Spring Model
The model connect every nodes together with a "spring", a link force that pushes linked nodes together or apart according to the desired distance. The strength of the "spring" force is proportional to the difference between the linked nodes distance and the target distance.
The implementation is based on [d3.forceLink()](https://github.com/d3/d3-force#forceLink) with the list of springs locked down so that every nodes are connected to each other. This comes with the benefit of huge memory usage decrease and lower the initialization time.
<a name="forceLinkFullyConnected" href="#forceLinkFullyConnected">#</a> d3.**forceLinkFullyConnected**() [<>](src/link.js "Source")
Creates a new link force with default parameters.
<a name="springLink_distance" href="#springLink_distance">#</a> *springLink*.**distance**([<i>distance</i>])
If *distance* is specified, sets the distance accessor to the specified number or function, re-evaluates the distance accessor for each link, and returns this force. If *distance* is not specified, returns the current distance accessor, which defaults to:
```js
function distance() {
return 30;
}
```
The distance accessor is invoked for each pair of node. If it is a function, the two nodes will be passed as the two arguments as follow:
```js
function distance(nodeA, nodeB) { return NumberDistanceBetweenAandB; }
```
The resulting number is then stored internally, such that the distance of each link is only recomputed when the force is initialized or when this method is called with a new *distance*, and not on every application of the force.
<a name="springLink_iterations" href="#springLink_iterations">#</a> *springLink*.**iterations**([*iterations*])
If *iterations* is specified, sets the number of iterations per application to the specified number and returns this force. If *iterations* is not specified, returns the current iteration count which defaults to 1. Increasing the number of iterations greatly increases the rigidity of the constraint, but also increases the runtime cost to evaluate the force.
### Neighbour and Sampling
The neighbour and sampling algorithm simplifies the model by only calculating the spring force of each node against several nearby and random nodes, at the cost of providing less accurate layout. In order for it to work properly, a distance function should be specified.
<a name="forceNeighbourSampling" href="#forceNeighbourSampling">#</a> d3.**forceNeighbourSampling**() [<>](src/neighbourSampling.js "Source")
Initializes the Neighbour and Sampling force with default parameters.
<a name="neighbourSampling_distance" href="#neighbourSampling_distance">#</a> *neighbourSampling*.**distance**([*distance*]) [<>](https://github.com/sReeper/d3-neighbour-sampling/blob/master/src/neighbourSampling.js#L230 "Source")
If *distance* is specified, sets the distance accessor to the specified number or function, re-evaluates the distance accessor for each link, and returns this force. If *distance* is not specified, returns the current distance accessor, which defaults to:
```js
function distance() {
return 300;
}
```
The distance accessor is invoked for each pair of node. If it is a function, the two nodes will be passed as the two arguments as follow:
```js
function distance(nodeA, nodeB) { return NumberDistanceBetweenAandB; }
```
<a name="neighbourSampling_neighbourSize" href="#neighbourSampling_neighbourSize">#</a> *neighbourSampling*.**neighbourSize**([*neighbourSize*])
If *neighbourSize* is specified, sets the neighbour set size to the specified number and returns this force. If *neighbourSize* is not specified, returns the current value, which defaults to 10.
<a name="neighbourSampling_sampleSize" href="#neighbourSampling_sampleSize">#</a> *neighbourSampling*.**sampleSize**([*sampleSize*])
If *sampleSize* is specified, sets the sample set size to the specified number and returns this force. If *sampleSize* is not specified, returns the current value, which defaults to 10.
<a name="neighbourSampling_latestAccel" href="#neighbourSampling_latestAccel">#</a> *neighbourSampling*.**latestAccel**()
Returns the average velocity changes of the latest iteration.
<a name="neighbourSampling_stableVelocity" href="#neighbourSampling_stableVelocity">#</a> *neighbourSampling*.**stableVelocity**([*threshold*])
If *threshold* is specified, sets a threshold and returns this force. When the average velocity changes of the system goes below the threshold, the function [onStableVelo's handler](#neighbourSampling_latestForce) will be called. Set it to a number less than 0 or remove the [handler](#neighbourSampling_latestForce) to disable the threshold checking. If *threshold* is not specified, returns the current value, which defaults to 0.
<a name="neighbourSampling_onStableVelo" href="#neighbourSampling_onStableVelo">#</a> *neighbourSampling*.**onStableVelo**([*handler*])
If *handler* is specified, sets the handler function which will be called at the end of each iteration if the average velocity changes of the system goes below the [threshold](#neighbourSampling_stableVelocity). To remove the handler, change it to null. If *threshold* is not specified, returns the current value, which defaults to null.
### Hybrid Layout Simulation - TO WRITE
The hybrid layout algorithm reduces the computation power usage even further by performing neighbour and sampling algorithm on only $\sqrt{n}$ sample subset of the data, and interpolating the rest in. Neighbour and sampling algorithm may also be ran again over the full dataset after the interpolation to refine the layout. This algorithm is only recommended for visualizing larger dataset.
<a name="hybrid" href="#hybrid">#</a> d3.**hybridSimulation**(*simulation*, *forceSample*, [*forceFull*]) [<>](src/hybridSimulation.js "Source")
Creates a new hybrid layout simulation default parameters. The simulation will takeover control of [d3.forceSimulation](https://github.com/d3/d3-force#forceSimulation) provided (*simulation* parameter). *forceSample* and *forceFull* are pre-configured [d3.forceNeighbourSampling](#forceNeighbourSampling) forces to be run over the $\sqrt{n}$ samples and full dataset respectively. While unsupported, other D3 forces such as [d3.forceLinkFullyConnected](forceLinkFullyConnected) may also work.
*forceSample* may have [stableVelocity](neighbourSampling_stableVelocity) configured to end the simulation and begin the interpolation phase early, but any [handler](neighbourSampling_onStableVelo) functions will be replaced be hybridSimulation's own internal function.
*forceSample* may be absent, null, or undefined to skip the final refinement.
*simulation* should have already been loaded with nodes. If there are any changes in the list of nodes, the simulation have to be re-set using the [.simulation](#hybrid_simulation) method.
<a name="hybrid_simulation" href="#hybrid_simulation">#</a> *hybrid*.**simulation**([*simulation*])
If *simulation* is specified, sets the [d3.forceSimulation](https://github.com/d3/d3-force#forceSimulation) to the given object and returns this layout simulation. Node list will be refreshed. If *simulation* is not specified, returns the current value, which defaults to 20.
<a name="hybrid_subSet" href="#hybrid_subSet">#</a> *hybrid*.**subSet**()
Returns the list of nodes in the $\sqrt{n}$ sample set. This is randomly selected on initialization or the nodes list have been refreshed by [.simulation](#hybrid_simulation) method. These nodes will be placed on 2D space from the beginning.
<a name="hybrid_nonSubSet" href="#hybrid_nonSubSet">#</a> *hybrid*.**nonSubSet**()
Returns the list of nodes outside of the $\sqrt{n}$ sample set. This is randomly selected on initialization or the nodes list have been refreshed by [.simulation](#hybrid_simulation) method. These nodes will be interpolated onto 2D space later on.
<a name="hybrid_forceSample" href="#hybrid_forceSample">#</a> *hybrid*.**forceSample**([*force*])
If *force* is specified, sets the neighbour and sampling force to run on the $\sqrt{n}$ samples before interpolation and returns this layout simulation. The same limitation applies: [stableVelocity](neighbourSampling_stableVelocity) may be configured to end the simulation and begin the interpolation phase early, but any [handler](neighbourSampling_onStableVelo) functions will be replaced be hybridSimulation's own internal function. If *force* is not specified, returns the current force object.
<a name="hybrid_forceFull" href="#hybrid_forceFull">#</a> *hybrid*.**forceFull**([*force*])
If *force* is specified, sets the neighbour and sampling force to run on the whole dataset after interpolation and returns this layout simulation. If set to null, the process will be skipped. If *force* is not specified, returns the current force object.
<a name="hybrid_sampleIterations" href="#hybrid_sampleIterations">#</a> *hybrid*.**sampleIterations**([*iterations*])
If *iterations* is specified, sets the number of iterations to run neighbour and sampling on the $\sqrt{n}$ samples before interpolation and returns this layout simulation. If *iterations* is not specified, returns the current value, which defaults to 300.
<a name="hybrid_fullIterations" href="#hybrid_fullIterations">#</a> *hybrid*.**fullIterations**([*iterations*])
If *iterations* is specified, sets the number of iterations to run neighbour and sampling on the whole dataset after interpolation and returns this layout simulation. If set to a number less than 1, the process will be skipped. If *iterations* is not specified, returns the current value, which defaults to 20.
<a name="hybrid_numPivots" href="#hybrid_numPivots">#</a> *hybrid*.**numPivots**([*number*])
If *number* is specified, sets the number of pivots used to find parents during the interpolation process and returns this layout simulation. If *number* is less than 1, brute-force method will be used instead. If *number* is not specified, returns the current value, which defaults to 0 (brute-force method).
<a name="hybrid_interpDistanceFn" href="#hybrid_interpDistanceFn">#</a> *hybrid*.**interpDistanceFn**([*distance*])
If *distance* is specified, sets the distance accessor used during the interpolation process to the specified number or function and returns this layout simulation. If *distance* is not specified, returns the current distance accessor, which defaults to the one provided by the force for full dataset or
```js
function distance() {
return 300;
}
```
If *distance* is a function, two nodes will be passed as the two arguments as follow:
```js
function distance(nodeA, nodeB) { return NumberDistanceBetweenAandB; }
```
<a name="hybrid_interpFindTuneIts" href="#hybrid_interpFindTuneIts">#</a> *hybrid*.**interpFindTuneIts**([*number*])
During the interpolation, each node will find a "parent", a near sample node whose 2D location is known. The parent will be used to find an initial location for the node. After that, spring forces are applied to the node against $\sqrt{\sqrt{n}}$ samples to fine-tune the location for a *number* of iterations. This is not to be confused with the neighbour and sampling refinement after the entire interpolation process is completed.
If *number* is specified, sets the number of refinement during the interpolation process and returns this layout simulation. If *number* is not specified, returns the current value, which defaults to 20.
<a name="hybrid_on" href="#hybrid_on">#</a> <i>hybrid</i>.<b>on</b>(*typenames*, [*listener*])
If *listener* is specified, sets the event *listener* for the specified *typenames* and returns this layout simulation. If an event listener was already registered for the same type and name, the existing listener is removed before the new listener is added. If *listener* is null, removes the current event listeners for the specified *typenames*, if any. If *listener* is not specified, returns the first currently-assigned listener matching the specified *typenames*, if any. When a specified event is dispatched, each *listener* will be invoked with the `this` context as the simulation.
The *typenames* is a string containing one or more *typename* separated by whitespace. Each *typename* is a *type*, optionally followed by a period (`.`) and a *name*, such as `tick.foo` and `tick.bar`; the name allows multiple listeners to be registered for the same *type*. The *type* must be one of the following:
* `sampleTick` - after each update of the simulation on the $\sqrt{n}$ subset.
* `fullTick` - after each update of the simulation on the full dataset.
* `startInterp` - just before the interpolation process
* `end` - after the hybrid sumulation ends.
Note that *tick* events are not dispatched when [*simulation*.tick](#simulation_tick) is called manually; events are only dispatched by the internal timer and are intended for interactive rendering of the simulation. To affect the simulation, register [forces](#simulation_force) instead of modifying nodes positions or velocities inside a tick event listener.
See [*dispatch*.on](https://github.com/d3/d3-dispatch#dispatch_on) for details.
<a name="hybrid_restart" href="#hybrid_restart">#</a> *hybrid*.**restart**()
Start or continue the simulation where it was left off and returns this layout simulation.
<a name="hybrid_stop" href="#hybrid_stop">#</a> *hybrid*.**stop**()
Stops the simulation, if it is running, and returns this layout simulation. If the it has already stopped, this method does nothing.
### Miscellaneous
<a name="calculateStress" href="#calculateStress">#</a> d3.**calculateStress**(*nodes*, *distance*) [<>](src/stress.js "Source")
Calculate stress of a whole system, based on sum-of-squared errors of inter-object distances. *nodes* is the array of all nodes in the system and *distance* is the function to calculate the desired distance between two node objects. *distance* is expected to have the same prototype as the one in [springLink](#springLink_distance).