เขียนเพิ่ม

2018-03-07 20:18:17 +00:00
parent c1f1f0ba30
commit 85f944d4d8
6 changed files with 267 additions and 70 deletions
--- a/graphs/hybridParams_2ndTime_100k_blank.png
+++ b/graphs/hybridParams_2ndTime_100k_blank.png
--- a/graphs/hybridParams_totalIts_100k.png
+++ b/graphs/hybridParams_totalIts_100k.png
--- a/graphs/hybridParams_totalTime_100k.png
+++ b/graphs/hybridParams_totalTime_100k.png
--- a/graphs/neighbourCache.png
+++ b/graphs/neighbourCache.png
--- a/l4proj.bib
+++ b/l4proj.bib
@@ -61,3 +61,43 @@ year={2017}}
 address = {Washington, DC, USA},
 keywords = {MDS, force directed placement, hybrid algorithms, multidimensional scaling, near-neighbour search, pivots, spring models},
 } 
@misc{PapaParse,
 title={Papa Parse},
 url={https://www.papaparse.com/}
 }
@book{CleanCode,
 title={Clean code: a handbook of agile software craftsmanship}, publisher={Prentice Hall}, author={Martin, Robert C. and Coplien, James O. and Wampler, Kevin and Grenning, James W. and Schuchert, Brett L. and Langr, Jeff and Ottinger, Timothy R. and Feathers, Michael C.}, year={2016}}
@misc{MDNWebWorker,
 title={Using Web Workers}, url={https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers}, journal={Mozilla Developer Network}}
@misc{WebCL,
 title={WebCL - Heterogeneous parallel computing in HTML5 web browsers}, url={https://www.khronos.org/webcl/}, journal={The Khronos Group}, year={2011}, month={Jul}}
@misc{asmjsSpeed,
 title={asm.js Speedups Everywhere}, url={https://hacks.mozilla.org/2015/03/asm-speedups-everywhere/}, journal={Mozilla Hacks – the Web developer blog}, author={Zakai, Alon and Wagner, Luke}, year={2015}, month={Mar}}
@misc{asmjs, title={Big Web App? Compile It!}, url={http://kripken.github.io/mloc_emscripten_talk}, publisher={Mozilla}, author={Zakai, Alon}}
@misc{gpujs,
 title={gpujs/gpu.js}, url={https://github.com/gpujs/gpu.js}, journal={GitHub}, year={2016}, month={Jan}}
@misc{WebGL2,
 title={WebGL 2.0 Arrives}, url={https://www.khronos.org/blog/webgl-2.0-arrives}, journal={Khronos Group}, year={2017}, month={Feb}}
@misc{WebAssembly, title={WebAssembly}, url={http://webassembly.org/}, journal={WebAssembly}}
@ARTICLE{LSH,
   author = {{Andoni}, A. and {Razenshteyn}, I.},
    title = "{Optimal Data-Dependent Hashing for Approximate Near Neighbors}",
  journal = {ArXiv e-prints},
 archivePrefix = "arXiv",
   eprint = {1501.01062},
 primaryClass = "cs.DS",
 keywords = {Computer Science - Data Structures and Algorithms},
     year = 2015,
    month = jan,
   adsurl = {http://adsabs.harvard.edu/abs/2015arXiv150101062A},
 }
--- a/l4proj.tex
+++ b/l4proj.tex
@@ -1,6 +1,8 @@
 \documentclass{l4proj}
 \usepackage{url}
 \usepackage{natbib}
 \usepackage{hyperref}
 \usepackage{fancyvrb}
 \usepackage[final]{pdfpages}
 \usepackage{algpseudocode}
@@ -9,6 +11,8 @@
 \usepackage{subcaption}
 \usepackage{listings}
 \usepackage{color}
 \usepackage{multicol}
 \usepackage[super]{nth}
 \renewcommand{\lstlistingname}{Code}% Listing -> Algorithm
@@ -82,6 +86,17 @@ Many approach, some map many features to 2 D, this is based on distance. the coo
 \section{Project Description}
 \section{Outline}
 The remainder of the report will discuss the following:
 \begin{itemize}
  \item \textbf{Background} This chapter discusses approaches to visualise high-dimensional data, and introduce the theory behind each algorithm implemented and evaluated.
  \item \textbf{Design} This chapters discusses choice of technologies.
  \item \textbf{Implementation} This chapter will briefly show decisions and justifications made during the implementation, along with several code snippets.
  \item \textbf{Evaluation} This chapter will detail the process used to compare the performance of each algorithm, starting from the experiment design to the final result.
  \item \textbf{Conslusion} This chapter give a brief summary of the project, reflect on the process in general, and discusses possible future improvements.
 \end{itemize}
 %==============================================================================
 %%%%%%%%%%%%%%%%
@@ -92,6 +107,10 @@ Many approach, some map many features to 2 D, this is based on distance. the coo
 \chapter{Background}
 \label{ch:bg}
 History of data vis, MDS, spring model, some other methods including parameters mapping like radar chart
 Before, there was linear combination, weakness?
 For small dimension, can use radar chart, or multi bar chart
 t-sne, weakness is slow
 linear combination method
 \section{Link force}
 \label{sec:linkbg}
@@ -119,6 +138,7 @@ The total number of spring calculations per iteration reduces from $N(N-1)$ to $
 Previous evaluations indicated that the quality of the produced layout improves as $Neighbours_{size}$ and $Samples_{size}$ grows larger. For larger datasets, setting the too small values could cause the algorithm to miss some details. However, favorable results can be obtained from numbers as low as 5 and 10 for $Neighbours_{size}$ and $Samples_{size}$\cite{Algo2002}.
 \section{Hybrid Layout for Multidimensional Scaling}
 \label{sec:bg_hybrid}
 In 2002, Alistair Morrison, Greg Ross, and Matthew Chalmers introduced a multi-phase, based on Chalmers' 1996 algorithm, to reduce the run time down to $O(N\sqrt{N})$. This is achieved by calculating the spring forces over a subset of data, and interpolating the rest onto the 2D space\cite{Algo2002}.
 %TODO Maybe history of hybrid layout, 3rd section on original paper
@@ -142,12 +162,13 @@ Finally, the Chalmers' spring model is applied to the full data set for a consta
 Previous evaluations show that this method is faster that the Chalmers' 1996 algorithm alone, and can create a layout with lower stress, thanks to the more accurate positioning in the interpolation process.
 \section{Hybrid MDS with Pivot-Based Searching algorithm}
 \label{sec:bg_hybridPivot}
 \begin{wrapfigure}{rh}{0.3\textwidth}
-\centering
+  \centering
-\includegraphics[width=0.3\textwidth]{images/pivotBucketsIllust.png}
+  \includegraphics[width=0.3\textwidth]{images/pivotBucketsIllust.png}
-\caption{Diagram of a pivot (dark shaded point) with six buckets, illustrated as discs between dotted circle. Each of the other points are classified into buckets by the distances to the pivot.}
+  \caption{Diagram of a pivot (dark shaded point) with six buckets, illustrated as discs between dotted circle. Each of the other points are classified into buckets by the distances to the pivot.}
-\label{fig:bg_pivotBuckets}
+  \label{fig:bg_pivotBuckets}
 \end{wrapfigure}
 The bottleneck of the hybrid model is the nearest-neighbor searching process during the interpolation. The previous brute-force method results in time complexity of $O(N\sqrt{N})$. This improvement introduces pivot-based searching to approximate a near-neighbor instead. This reduces the time complexity to $O(N^\frac{5}{4})$\cite{Algo2003}.
@@ -181,16 +202,19 @@ With this method, the parent found is not guaranteed to be the closest point. Pr
 \section{Performance Metrics}
 \label{sec:bg_metrics}
-To compare different algorithms they have to be tested against the same set of performance metric. During the development, a number of metrics were used to objectively judge the resulting graph and computation requirement. The evaluation process in section \ref{ch:eval} will focuses on the following metrics.
+To compare different algorithms they have to be tested against the same set of performance metric. During the development, a number of metrics were used to objectively judge the resulting graph and computation requirement. The evaluation process in chapter \ref{ch:eval} will focuses on the following metrics.
 \begin{itemize}
  \item \textbf{Execution time} is a broadly used metric for any algorithm requiring any significant computational power. Some applications aim to be interactive and the algorithm have to finish the calculations within the time constraints for the program to stay responsive. This project, however, focuses on large data sets with minimal user interaction. Hence, the execution time in this project is a measures of the time an algorithm takes to produce its "final" result. The criteria for this will be discussed in details in section \ref{ch:eval}.
  \item \textbf{Stress} is one of the most popular metric for spring-based layout algorithm, modeled from the mechanical stress of a spring system. It is based on sum-of-squared errors of inter-object distance\cite{Algo1996}. The function is defined as follow. $$Stress = \frac{\sum_{i<j} (d_{ij}-g_{ij})^2}{\sum_{i<j} g^2_{ij}}$$ $d_{ij}$ denotes the desired high-dimensional distance between object $i$ and $j$ while $g_{ij}$denotes the low-dimensional distance.
-  While Stress is a good metric to evaluate a layout, its calculation is an expensive operation ($O(n^2)$) and is not part of the operation of any algorithm. As a result, we can not measure the execution time of an algorithm if we calculate the stress between each iteration.
+  While Stress is a good metric to evaluate a layout, its calculation is an expensive operation ($O(n^2)$). At the same time, it is not part of the operation of any algorithm. Thus, by adding this optional measurement between iteration, every algorithm takes a lot longer that complete, invalidating the measured execution time of the run.
  \item \textbf{Memory usage} With more interests in machine learning, the number of data points in a data set is getting bigger. It is common to encounter data sets with tens or hundreds of thousands instances, each with possibly hundreds of attributes. Therefore, memory usage shows how an algorithm scales to larger data sets and how many data points can a computer system handle.
 \end{itemize}
 \section{Summary}
 In this chapter, different techniques of multidimensional scaling have been explored. As the focus of the project is on three spring model algorithms, the theory of each of the method have been discussed. Finally, in order to measure the performance of each algorithm, different metrics were introduced and will be used for the evaluation process.
 %==============================================================================
 %%%%%%%%%%%%%%%%
 %              %
@@ -215,8 +239,6 @@ something
 \section{Input Data and Parameters}
 \section{Graphical User Interface for Evaluation}
 %==============================================================================
 %%%%%%%%%%%%%%%%
@@ -228,6 +250,16 @@ something
 \label{ch:imp}
 \section{Outline}
 D3-force module provide a simplified Simulation object to control various calculations. Each Simulation contain data point nodes and Force objects. Interfaces were defined, allowing each Force to access the node list. To keep track of positions, each node will be assigned values representing its current location and velocity vector. These values can then be used by the application to draw a graph. In each constant unit time step (iteration), the Simulation will trigger a function in each Force, allowing them to add values to each particle's velocity vector, which will then be added to the particle's position.
 Because D3-force are libraries to be built into other web applications, the algorithms implemented can not be used on their own. Fortunately, as part of Bartasius' Level 4 project in 2017, a web application for testing and evaluation has already been created with graphical user interface designed to allow the user to easily select an algorithm, data set, and parameter values. Various distance functions, including one specifically created to handle the Poker Hands data set\cite{UCL_Data} which will be used for evaluation (section \ref{sec:EvalDataSet}), are also in place and fully functional.
 The csv-formatted data file can be loaded locally. Next, it is parsed using Papa Parse JavaScript library\cite{PapaParse} and then put on the simulation.
 Depending on the distance functions, per-dimension mean, variance, and other attributes may also be calculated as well. These values are used in general distance functions to scale values of each feature properly. The D3 simulation layout is shown on an SVG canvas with zoom functionality to allow graph investigation. The distance function scaling was tweaked to only affect rendering and not the force calculation.
 Several values used for evaluation such as execution time, total force applied per iteration, and stress is also computed. However, these values are printed out to JavaScript console instead.
 Due to the growing number of algorithms and variable, the main JavaScript code have been refactored. Functions for controlling each algorithm have been extracted to their own file, and unused codes are removed.
 \section{Algorithms}
 This section discusses implementation decisions for each algorithm, some of which are already implemented in D3 force module and the d3-neighbour-sampling plugin. Adjustments made to third-party implemented algorithms are also discussed.
@@ -236,7 +268,9 @@ This section discusses implementation decisions for each algorithm, some of whic
 \label{sec:imp_linkForce}
 D3-force module have implemented an algorithm to produce a force-directed layout. The main idea is to change the velocity vector of each pair connected via a link at every time step, simulating force application. For example, if two nodes are further apart than the desired distance, a force is applied to both nodes to pull them together. The implementation also supports incomplete graphs, thus the links have to be specified. The force is also, by default, scaled on each node depending on how many spring it is attached to, in order to balance the force applied to heavily and lightly connected nodes, improving the stability. Without such scaling, the graph would expands into every direction.
-Looking at the use case of Multidimensional scaling, many features are unused and could be removed to reduce computation time and memory usage. Firstly, to accommodate an incomplete graph, the force scaling have to be calculated for each node and each link. The calculated values are then cached in a similar manner to the distances ($bias$ and $strengths$ in code \ref{lst:impl_LinkD3}). In a fully-connected graph, these values are the same for every links and nodes. To save on memory and startup time, the arrays is replaced by a single number value instead.
+In the early stages of the project, when assessing the library, it is observed that many of the available features are unused for Multidimensional scaling. In order to reduce the computation time and memory usage, I created a modified version of Force Link as part of the plug-in. The following are the improved aspects.
 Firstly, to accommodate an incomplete graph, the force scaling has to be calculated for each node and each link. The calculated values are then cached in a similar manner to the distances ($bias$ and $strengths$ in code \ref{lst:impl_LinkD3}). In a fully-connected graph, these values are the same for every links and nodes. To save on memory and startup time, the arrays was replaced by a single number value.
 \begin{lstlisting}[language=JavaScript,caption={Force calculation function of Force Link as implemented in D3.},label={lst:impl_LinkD3}]
  function force(alpha) {
@@ -265,19 +299,16 @@ Secondly, D3's Force Link require the user to specify and array of links to desc
    ...
    for (var k = 0, source, target, i, j, x, y, l; k < iterations; ++k) {
      for (i = 1; i < n; i++) for (j = 0; j < i; j++) { // For each link
-        // jiggle so l won't be zero and causes divide by zero error after this
+        // jiggle so x, y and l won't be zero and causes divide by zero error later on
-        source = nodes[i];
+        source = nodes[i];  target = nodes[j];
        target = nodes[j];
        x = target.x + target.vx - source.x - source.vx || jiggle();
        y = target.y + target.vy - source.y - source.vy || jiggle();
        l = Math.sqrt(x * x + y * y);
        //dataSizeFactor = 0.5/(nodes.length-1), pre-calculated only once
        l = (l - distances[i*(i-1)/2+j]) / l  * dataSizeFactor * alpha;
        x *= l, y *= l;
-        target.vx -= x;
+        target.vx -= x;  target.vy -= y;
-        target.vy -= y;
+        source.vx += x;  source.vy += y;
        source.vx += x;
        source.vy += y;
      }
    }
  ...
@@ -286,31 +317,134 @@ Secondly, D3's Force Link require the user to specify and array of links to desc
 After optimisation, the execution time decreases marginally while memory consumption decreases by a seventh, raising data size limit from 3,200 data points\cite{LastYear} to over 10,000 in the process. Details on the evaluation procedure and data size limitation will be discussed in section \ref{ssec:eval_ram}.
-\begin{figure}[h] % Poker 100 BAD
+\begin{figure}[h]
  \centering
  \includegraphics[height=5cm]{graphs/linkOptimize.png}
  \caption{A comparison in memory usage and execution time between versions Force Link at 3,000 data points from Poker Hands data set for 300 iterations.}
  \label{fig:imp_linkComparison}
 \end{figure}
-Finally, a feature is added to track the average force applied to the system in each iteration. A threshold value can be set so once average force falls below the threshold, a function can be called allowing the user to stop the simulation. This is feature will be heavily used in the evaluation process (section \ref{ssec:eval_termCriteria}).
+Finally, a feature is added to track the average force applied to the system in each iteration. A threshold value can be set so once average force falls below the threshold, a user-defined function is be called. In this case, a handler is added to Bartasius' application to stop the simulation. This is feature will be heavily used in the evaluation process (section \ref{ssec:eval_termCriteria}).
 %============================
 \subsection{Chalmers' 1996}
-Force scaling
+\label{sec:imp_neigbour}
-Tried caching
+Bartasius' d3-neighbour-sampling plug-in have the main focus on Chalmers' 1996 algorithm. The idea is to use the exact same force calculation function as D3 Force Link to allow for fair comparison. The algorithm was also implemented as a Force, to be used by a Simulation. As part of the project, I refactored the code base to ease the development process and improved a shortcoming.
 Aside from formatting the code, Bartasius' implementation does not have spring force scaling, making the graph explodes in every direction. Originally, the example implementation used decaying $alpha$, a variable controlled by the Simulation used for artificially scaling down the force applied to the system over time and causing the system to contract back. A constant dataSizeFactor, similar to that in the custom Link Force, have been added to mitigate the requirement of decaying alpha.
 \begin{figure}[h]
  \centering
  \includegraphics[height=5cm]{graphs/neighbourCache.png}
  \caption{A comparison in execution time between force and distance calculation, with and without caching on 5,000 data points of Poker Hands data set for 300 iterations.}
  \label{fig:imp_chalmersCache}
 \end{figure}
 Next, after seeing memory footprint of the optimized Link Force, an idea occurred to also cache all the distances between every pair of nodes. After the implementation, an experiment was ran to compare the performance. However, with a data set of moderate size and higher iterations than typical requirement, the time spent on caching is higher than time saved, resulting in longer total execution time (figure \ref{fig:imp_chalmersCache}). The JavaScript heap usage also raises to 128 MB with manually-invoked garbage collector (will be discussed in section \ref{ssec:eval_ram}) when originally, it has never used more than 50 MB. With all these drawbacks, this patch was withdrawn.
 Lastly, the average force applied tracker, similar to that added to Force Link, is also added for the evaluation process. It should be noted that unlike Force Link, the system will not stabilise to a freezing point. Because the $Samples$ set keeps changing randomly, there is no single state where every spring forces cancel each other out completely. This is also reflected when the animation is drawn where every nodes keep wiggling about but the overall layout remains constant.
 %============================
 \subsection{Hybrid Layout}
-%============================
+Because Hybrid Layout is a multi-phase use of the Chalmers' algorithm, it does not fit well with the limited interfaces designed for Force objects. The approach taken is to implement the Hybrid algorithm as a new JavaScript object that takes control of the Simulation instead. To make it fit with other D3 APIs design have to first be studied.
-\subsection{Hybrid Layout with Pivot}
+
 The D3 API extensively with the Method Chaining design pattern. The main idea is that by having each method returns the object itself rather than void, the method calls on the same object can be chained together in a single statement\cite{CleanCode}. In addition, the code readability is also improved to a certain degree. With the knowledge in mind, the same trend is followed for this Hybrid object.
 \begin{lstlisting}[language=JavaScript,caption={Simplified example of using the API.},label={lst:impl_HybridUsage}]
  let simulation = d3.forceSimulation() // Configured D3 Simulation object
    .nodes(allNodes);
  let firstPhaseForce = d3.forceNeighbourSampling() // Configured Chalmers force object
    .neighbourSize(NEIGHBOUR_SIZE)
    .sampleSize(SAMPLE_SIZE)
    .distance(distanceCalculator);
  let thirdPhaseForce = d3.forceNeighbourSampling() // Configured Chalmers force object
    .setParameter(Value); // Similar to above
  let hybridSimulation = d3.hybridSimulation(simulation)
    .forceSample(firstPhaseForce)
    .forceFull(thirdPhaseForce)
    .numPivots(PIVOTS ? NUM_PIVOTS:0) // brute-force is used when < 1 pivot is specified
    .on("startInterp", function () {setNodesToDisplay(allNodes);})
    ...;
  let firstPhaseSamples = hybridSimulation.subSet();
  setNodesToDisplay(firstPhaseSamples);
  hybridSimulation.restart(); // Start the hybrid simulation
  }
 \end{lstlisting}
 As shown in code \ref{lst:impl_HybridUsage}, the parameters for each Chalmers' force objects are set in advance by the user. This potentially allow other force calculation objects to be used in place of the current one without having to modify the Hybrid object. To terminate the Chalmers' force in the first and last phase, the Hybrid object have an internal iteration counter to stop the force calculations after predefined period of time. In addition, the applied force threshold events are also supported as an alternative termination criteria.
 For interpolation, two separate functions were created for each method. After the parent is found, both functions call the same third function to handle the rest of the process (step 2 to 8 of in section \ref{sec:bg_hybrid}).
 \begin{multicols}{2}
 \begin{algorithmic} % BRUTEFORCE
 \item Interpolation with brute-force parent finding:
 \item Select random samples $s$ from $S$. $s\subset{S}$.
 \ForAll{node $n$ to be interpolated}
 	\State Create distance cache array, equal to size of $s$
 	\ForAll{node $i$ in $S$}
        \State Perform distance calculation.
        \If{$i\in{s}$} cache distance \EndIf
    \EndFor
    \State \Call{Place n}{$n$, closest $i$, $s$, distance cache}
 \EndFor
 \end{algorithmic}
 \columnbreak
 \begin{algorithmic} % FOR INTERPOLATION
 \item Interpolation with pivot-based parent finding:
 \item Preprocess pivots buckets
 \item Select random samples $s$ from $S$. $s\subset{S}$.
 \ForAll{node $n$ to be interpolated}
 	\State Create distance cache array, equal to size of $s$
 	\ForAll{pivot $p$ in $k$}
      \State Perform distance calculation.
      \If{$p\in{s}$} cache distance \EndIf
      \ForAll{node $i$ in bucket}
        \State Perform distance calculation.
        \If{$i\in{s}$} cache distance \EndIf
      \EndFor
    \EndFor
    \State Fill in the rest of the cache
    \State \Call{Place n}{$n$, closest $k$, $s$, distance cache}
 \EndFor
 \end{algorithmic}
 \end{multicols}
 Since the original paper did not specify the method of placing a node $n$ on the circle around its parent (step 3 to 4), Matthew Chalmers, the project advisor who also took part in developing the algorithm, was contacted for clarification. Unfortunately, the knowledge was lost. Instead, Sum of distance error between $n$ and every member of $s$ was proposed as an alternative. Preliminary testings shows that it works well and is used for this implementation.
 With that decision, the high-dimensional distances between $n$ and each member of $s$ are used multiple times for binary searching and placement refinement (step 7 and 8). To reduce the time complexity, a distance cache have been created. For brute-force parent finding, the cache can be filled while the parent is being selected as $s\subset{S}$. On the other hand, pivot-based searching might not cover every member of $s$. Thus, the missing caches are filled after parent searching.
 %============================
-\section{Integration with D3}
+\section{Metrics}
 Many different metrics were introduced in section \ref{sec:bg_metrics}, some of them require extra code to be written. While memory usage measurement requires an external profiler, execution time can also be calculated by the application. For JavaScript, the recommended way is to take the high-resolution time-stamp before and after code execution. The method provides accuracy up-to 5 microseconds. It is important to note that with the level of precision, the measured value will vary from run-to-run, due to many factors both from software such as OS' process scheduler, and hardware such as Intel\textsuperscript{\textregistered} Turbo Boost or cache prefetch.
-\section{Performance-improving Decisions}
+\begin{lstlisting}[language=JavaScript,caption={Execution time measurement.},label={lst:impl_Time}]
-\subsection{Different types of loops}
+p1 = performance.now();
-%============================
+// Execute algorithm
-\subsection{Caching distances for Chalmers' 1996 algorithm}
+p2 = performance.now();
-%============================
+console.log("Execution time", p2-p1);
 \end{lstlisting}
 Stress calculation is done as defined by the formula in section \ref{sec:bg_metrics}. The calculation is independent of the algorithm. In fact, it does not depend on D3 at all. Only an array of node objects and a distance calculation is required. Due to its very long calculation time, this function is only called on-demand when the value have to be recorded.The exact implementation is shown in code \ref{lst:impl_Stress}.
 \begin{lstlisting}[language=JavaScript,caption={Stress calculation function.},label={lst:impl_Stress}]
 export function getStress(nodes, distance) {
  let sumDiffSq = 0;  let sumLowDDistSq = 0;
  for (let j = nodes.length-1; j >= 1; j--) {
    for (let i = 0; i < j; i++) {
      let source = nodes[i], target = nodes[j];
      let lowDDist = Math.hypot(target.x - source.x, target.y - source.y);
      let highDDist = distance(source, target);
      sumDiffSq += Math.pow(highDDist - lowDDist, 2);
      sumLowDDistSq += lowDDist * lowDDist;
    }
  }
  return Math.sqrt(sumDiffSq / sumLowDDistSq);
 }
 \end{lstlisting}
 %==============================================================================
 %%%%%%%%%%%%%%%%
@@ -320,9 +454,7 @@ Tried caching
 %%%%%%%%%%%%%%%%
 \chapter{Evaluation}
 \label{ch:eval}
-
+This chapter presents the comparison between each of the implemented algorithm. First, data sets that were used will be described. The experiments setup are then introduced, along with decision behind the test designs. Lastly, the results are shown and briefly interpreted.
 %TODO SOMETHING HERE
 % Link is golden standard, the rest try to get to that but cut corners
 \section{Data Sets}
 \label{sec:EvalDataSet}
@@ -342,26 +474,28 @@ The Antarctic data set contain 2,202 measurements by remote sensing probes over
 \section{Experimental Setup}
 Hardware and web browser can greatly impact the JavaScript performance. In addition to from the code and dataset, these variables have to be controlled as well.
-The computers used are all the same model of a Dell All-in-One desktop computer with Intel\textregistered{} Core\texttrademark{} i5-3470S and 8GB of DDR3 memory, running CentOS 7 with Linux 3.10-x86-64.
+The computers used are all the same model of a Dell All-in-One desktop computer with Intel\textsuperscript{\textregistered} Core\texttrademark{} i5-3470S and 8GB of DDR3 memory, running CentOS 7 with Linux 3.10-x86-64.
 As for web browser, the official 64-bit build of Google Chrome 61.0.3163.79 is used to both run and analyse CPU and memory usage with its performance profiling tool.
-Other unrelated parameters have to also be controlled as much as possible. The starting position of all nodes are locked at $(0,0)$ and the simulation's velocity decay is set at default of $0.4$, mimicking air friction. Alpha, a decaying value used for artificially slowing down or freezing the system over time, is also kept at 1 to keep the springs' forces in full effect. The web page is also refreshed after every run to make sure that everything, including uncontrollable aspects such as JavaScript heap and the behavior of the browser's garbage collector, have been properly reset.
+Other unrelated parameters have to also be controlled as much as possible. The simulation's velocity decay is set at default of $0.4$, mimicking air friction, and the starting position of all nodes are locked at $(0,0)$. Although starting every nodes at the exact same position may seems to cause a very high initial spring force, the force scaling and the way D3 takes each node's velocity as part of spring force calculation prevent the system from spreading out too far. In practice, the graphs have to continue to expand for several more iterations before the overall layout reaches the correct size. Alpha, a decaying value used for artificially slowing down and freezing the system over time, is also kept at 1 to keep the springs' forces in full effect.
 The web page is also refreshed after every run to make sure that everything, including uncontrollable aspects such as JavaScript heap, ahead-of-time compilation and the behavior of the browser's garbage collector, have been properly reset.
 \subsection{Termination criteria}
 \label{ssec:eval_termCriteria}
 Both Link force and the Chalmers' 1996 algorithm create a layout that stabilises over time. In D3, calculations are performed for a predefined number of iterations. This have a drawback of having to select an appropriate value. Choosing the number too high means that execution time is wasted calculating minute details with no visible change to the layout while the opposite can results in a bad layout.
 Determining the constant number can be problematic, considering that each algorithm may stabilise after different number of iterations, especially when the interpolation result can vary greatly from run-to-run.
-An alternative method is to stop when a condition is met. One such condition purposed is the difference in velocity of the system between iterations\cite{Algo2002}. In other word, once the amount of force applied in that iteration is lower than a scalar threshold, the calculation may stop. Taking note of stress and average force applied over multiple iterations as illustrated in figure \ref{fig:eval_stressVeloOverTime}, it is clear that Link Force converges complete stillness while the Chalmers algorithm's average force reaches and fluctuate around a constant. Because the $Samples$ set keeps changing randomly, the system will not reach a state where every spring forces cancel each other out completely. This is also reflected when the animation is drawn where every nodes keep wiggling about but the overall layout remains constant. It can also be seen that stress of each layout converges a minimal value as the average force converges a constant, indicating that the best layout from each algorithm can be obtained once the system stabilizes.
+An alternative method is to stop when a condition is met. One such condition purposed is the difference in velocity ($\Delta{v}$) of the system between iterations\cite{Algo2002}. In other word, once the amount of force applied in that iteration is lower than a scalar threshold, the calculation may stop. Taking note of stress and average force applied over multiple iterations as illustrated in figure \ref{fig:eval_stressVeloOverTime}, it is clear that Link Force converges to a complete stillness while the Chalmers algorithm's average force reaches and fluctuate around a constant as stated in section \ref{sec:imp_neigbour}. It can also be seen that stress of each layout converges a minimal value as the average force converges a constant, indicating that the best layout from each algorithm can be obtained once the system stabilizes.
 \begin{figure}
  \centering
  \includegraphics[height=5cm]{graphs/stressVeloOverTime.png}
-  \caption{A log-scaled graph showing decreasing stress and forces applied per iteration over time covering a constant number.} %10,000 data points
+  \caption{A log-scaled graph showing decreasing stress and forces applied per iteration over time covering a constant number when running different algorithm over 10,000 data points of Poker Hands data set. Stress is calculated every \nth{10} iteration.}
  \label{fig:eval_stressVeloOverTime}
 \end{figure}
-Since stress takes too long to calculate every iteration, termination criteria selected is the average force applied. This criteria is used for all 3 algorithms for consistency. The cut-off constant is then manually selected for each algorithm for each subset used. Link force's threshold is a value low enough that there are no visible changes and stress have reached near minimum. The Chalmers' threshold is the lowest possible value that will be reached most of the time.
+Since stress takes too long to calculate every iteration, termination criteria selected is the average force applied per node. This criteria is used for all 3 algorithms for consistency. The cut-off constant is then manually selected for each algorithm for each subset used. Link force's threshold is a value low enough that there are no visible changes and stress have reached near minimum. The Chalmers' threshold is the lowest possible value that will be reached most of the time. It is interesting to note that with bigger subset of the Poker Hands data set, the threshold rises to 0.66 from 3,000 data points onward.
 By selecting this termination condition, the goal of the last phase of the Hybrid Layout algorithm is flipped. Rather than performing the Chalmers' algorithm over the whole dataset to correct interpolation errors, the interpolation phase's role is to help the final phase reaches stability quicker. Thus, parameters of the interpolation phase can not be evaluated on their own. Taking more time to produce a better interpolation result may or may not effect the number of iterations in the final phase, creating the need to balance between time spent and saved by interpolation.
@@ -369,7 +503,7 @@ By selecting this termination condition, the goal of the last phase of the Hybri
 \subsection{Selecting Parameters}
 \label{ssec:eval_selectParams}
-Some of the algorithms have variables that are predefined constant numbers. Care have to be taken in choosing these values as bad choices could cause the algorithm to produce bad results or takes unnecessarily long computation time. To compare each algorithm fairly, an optimal set of parameters have to be chosen for each.
+Some of the algorithms have variables that are predefined constant numbers. Care have to be taken in choosing these values as bad choices could cause the algorithm to produce bad results or takes unnecessarily long computation time. To compare each algorithm fairly, a good set of parameters have to be chosen for each.
 The Chalmers' algorithm have two adjustable parameters: $Neighbours_{size}$, $Samples_{size}$.
 According to previous evaluations\cite{LastYear}\cite{Algo2002}, favorable layout could be achieved with values as low as $10$ for both variables. Preliminary testings seems to confirm the findings and the values are selected for the experiments. In the other hand, Link force have no adjustable parameter whatsoever so no attention is required.
@@ -378,7 +512,7 @@ According to previous evaluations\cite{LastYear}\cite{Algo2002}, favorable layou
  \centering
  \includegraphics[height=5cm]{graphs/hitrate_graph1.png}
  \includegraphics[height=5cm]{graphs/hitrate_graph2.png}
-  \caption{Graphs showing accuracy of pivot-based searching between $k = $ 1, 3, 6, and 10. The left box-plot graph shows the percentage across 5 different runs (higher and more consistent is better). The right shows the high-dimensional distance ratio between the candidate parent chosen by pivot-based searching and the best parent (closer to 1 is better). For instance, if the best parent is 1 unit away from the querying node, a ratio of 1.3 means that the candidate parent is 1.3 unit away.}
+  \caption{Graphs showing accuracy of pivot-based searching between $k = $ 1, 3, 6, and 10. The left box-plot graph shows the percentage across 5 different runs (higher and more consistent is better). The right shows the high-dimensional distance ratio between the parent chosen by brute-force and pivot-based searching when they are not the same (closer to 1 is better). For instance, if the parent found by brute-force searching is 1 unit away from the querying node, a ratio of 1.3 means that the parent found by pivot-based searching parent is 1.3 unit away.}
  \label{fig:eval_pivotHits}
 \end{figure}
@@ -420,7 +554,7 @@ Finally, the last step of interpolation is to refine the placement for a constan
 %============================
 \subsection{Performance metrics}
-As discussed in section \ref{sec:bg_metrics}, there are three main criteria to evaluate each algorithm: execution time, memory consumption, and the produced layout. Although stress is a good metric to judge the quality of a layout, it does not necessary means that layouts of the same stress are equally as good for data exploration. Thus, the looks of the product itself have to also be compared. Since both the Chalmers' and Hybrid algorithms have the goal of mimicking the Link Force's result while cutting cost as much as possible, the layout of Link Force will be used as a baseline for comparison (figure \ref{fig:eval_idealSample}). The closer the layout is to the baseline, the better.
+As discussed in section \ref{sec:bg_metrics}, there are three main criteria to evaluate each algorithm: execution time, memory consumption, and the produced layout. Although stress is a good metric to judge the quality of a layout, it does not necessary means that layouts of the same stress are equally as good for data exploration. Thus, the looks of the product itself have to also be compared. Since Bartasius have found that Link Force provide a layout with the least stress in all cases, the its layout will be used as a baseline for comparison (figure \ref{fig:eval_idealSample}).
 It should also be noted that for ease of comparison, the visualisations may be uniformly scaled and rotated. This manipulation should not effect the evaluation as the only concern of a spring model is relative distance between data points.
 %============================
@@ -429,7 +563,8 @@ It should also be noted that for ease of comparison, the visualisations may be u
 \subsection{Memory usage}
 \label{ssec:eval_ram}
-Google Chrome comes with the performance profiling tools, allowing users to measure JavaScript heap usage. While it is straightforward to measure the usage of Link Force, the 1996 algorithm causes problems with the garbage collector. Because the $Samples$ sets and, to a certain degree, $Neighbours$ sets are reconstructed at every iterations, a lot of new memory spaces are allocated and the old ones are left unreachable, waiting to be reclaimed. As a result, the JS heap usage keeps increasing until the GC runs even though the actual usage is theoretically constant across multiple iterations (figure \ref{fig:eval_neighbourRam}). Even though GC is designed to be only be run automatically by the JavaScript engine, Google Chrome allow it to be manually called in the profiling tool. For this experiment, GC will be manually called periodically during part of the run. The usage immediately after garbage collection is then be recorded and used for comparison. The peak before GC automatically gets invoked is also noted.
+
 Google Chrome comes with the performance profiling tools, allowing users to measure JavaScript heap usage. While it is straightforward to measure the usage of Link Force, the 1996 algorithm the garbage collector gets in the way of obtaining an accurate value. Because the $Samples$ sets and, to a certain degree, $Neighbours$ sets are reconstructed at every iterations, a lot of new memory spaces are allocated and the old ones are left unreachable, waiting to be reclaimed. As a result, the JS heap usage keeps increasing until the GC runs even though the actual usage is theoretically constant across multiple iterations (figure \ref{fig:eval_neighbourRam}). Even though GC is designed to be only be run automatically by the JavaScript engine, Google Chrome allow it to be manually called in the profiling tool. For this experiment, GC will be manually called periodically during part of the run. The usage immediately after garbage collection is then be recorded and used for comparison. The peak before GC automatically gets invoked is also noted.
 \begin{figure}
  \centering
@@ -450,13 +585,16 @@ The hybrid layout has multiple phases, each with different theoretical memory co
 The comparison have been made between the 3 algorithms with hybrid layout running 10 pivots to represent the worst case scenario for interpolation. Rendering is also turned off to minimize the impact due to DOM elements manipulation\cite{LastYear}. The results are displayed in figure \ref{fig:eval_ram}. The modified Link Force, which use less memory compared to the D3's implementation (section \ref{sec:imp_linkForce}), scales badly compared to all others, even with automatic garbage collection. The difference in the base memory usage between the 1996 algorithm and the final stage of Hybrid layout is also within the margin of error, confirming that they both have the same memory requirement. If the final phase of the Hybrid layout is skipped, memory requirement will grow at a slightly lower rate.
-Due to JavaScript limitation, Link Force crashes the browser tab at 50,000 data points before any spring force is calculated, failing the test entirely. The similar behavior can also be observed with the D3's implementation. In contrast, the Chalmers' algorithm (and the hybrid) can process with as much as 470,000 data points. Interestingly, while the Chalmers' algorithm can also handle 600,000 data points with rendering, the 8GB memory is all used up, causing heavy thrashing and slowing down the entire machine. Considering that, paging does not occur when Link Force crashes the browser tab, memory requirement may not the only limiting factor in play.
+Although the original researchers, Chalmers, Morrison, nor Ross, have explored the memory aspect before, Bartasius experimented with the maximum data size the application handle before Out of Memory exception occurred\cite{LastYear}. A similar test is re-performed to find if there has been any changes.
 Due to JavaScript limitation, Link Force crashes the browser tab at 50,000 data points before any spring force is calculated, failing the test entirely. The similar behavior can also be observed with the D3's implementation. In contrast, the Chalmers' and hybrid algorithm can process as much as 470,000 data points. Interestingly, while the Chalmers' algorithm can also handle 600,000 data points with rendering, the 8GB memory is all used up, causing heavy thrashing and slowing down the entire machine. Considering that, paging does not occur when Link Force crashes the browser tab, memory requirement may not the only limiting factor in play.
 All in all, since a desirable result can not be obtained from Hybrid algorithm if the final stage is skipped, there is no benefit in term of memory usage from using the Hybrid layout, compared to Chalmers' algorithm. Both of them have a lot smaller memory footprint compared to Link Force and can work with a lot more data points on the same hardware constraint.
 %============================
 \subsection{Different Parameters for the Hybrid Layout}
-In section \ref{ssec:eval_termCriteria}, it has been concluded that the value of the parameters can not be evaluated on their own. Based on findings discussed in section \ref{ssec:eval_selectParams}, 10 different combinations of interpolation parameters were chosen: Brute force and 1, 3, 6, and 10 pivots, each with and without refinement at the end. Due to possible variations from the sample set $S$, each experiment is also performed 5 times. The data sets used are Poker Hands with 10,000 data points, which is the highest amount where stress can be calculated before crashing the web page, and 100,000 data points to hi-light the widen difference in interpolation time.
+In section \ref{ssec:eval_termCriteria}, it has been concluded that the value of the parameters can not be evaluated on their own. Based on findings discussed in section \ref{ssec:eval_selectParams}, 10 different combinations of interpolation parameters were chosen: Brute force and 1, 3, 6, and 10 pivots, each with and without refinement at the end. Due to possible variations from the sample set $S$, each experiment is also performed 5 times. Data sets used are Poker Hands with 10,000 data points, which is the highest amount where stress can be calculated before crashing the web page, and 100,000 data points to hi-light the widen difference in interpolation time.
 It should also be noted that while the original researchers had a similar experiment\cite{Algo2003}, it only explored the difference in execution time usage between random parent, brute-force, and pivot-based parent finders. The different values of each parameters was not taken into consideration and the produced results were assumed to be equal across multiple different runs.
 \begin{figure}
  \centering
@@ -476,8 +614,12 @@ Surprisingly, despite lower time complexity, selecting higher number of pivots o
  \centering
  \includegraphics[height=5cm]{graphs/hybridParams_2ndTime_100k.png}
  ~
  \includegraphics[height=5cm]{graphs/hybridParams_totalIts_100k.png}
  \includegraphics[height=5cm]{graphs/hybridParams_2ndTime_100k_blank.png}
  ~
  \includegraphics[height=5cm]{graphs/hybridParams_totalTime_100k.png}
-  \caption{Comparison of different interpolation parameters of the hybrid layout at 100,000 data points.}
+  \caption{Comparison of different interpolation parameters of the hybrid layout at 100,000 data points. The two Box and Whisker plots are aligned for ease of comparison.}
  \label{fig:eval_hybridParams100k}
 \end{figure}
@@ -501,20 +643,20 @@ Between brute-force and 1 pivot, there is no visual difference aside from variat
 In summary, to obtain quality layout, the refining step of the interpolation phase can not be ignored. Pivot-based searching only provide a significant benefit with very large data set and/or slow distance function. Otherwise, brute-force method can yield a better layout in consistently less time.
 %============================
-\subsection{Comparison between algorithms}
+\subsection{The 3-way comparison}
 Figure \ref{sfig:eval_multiAlgoTime} shows the execution time and stress of the produced layout of each algorithm with various data sets. The results reveal that the Hybrid algorithm is superior to other algorithms across the board. The difference compared to Chalmers' algorithm is so large that the time difference from to parameter settings seems insignificant. It should be noted that with smaller data sets, the processing time in each iteration can be faster than 17 milliseconds, the time between each frame on a typical monitor running at 60 frames per second. In D3-force, the processing is put on idle until the next screen refresh. As a result, the total execution time is limited to the number of iterations.
 \begin{figure}[h]  % GRAPH
  \centering
  \begin{subfigure}{0.45\textwidth}
-    \includegraphics[height=4cm]{graphs/multiAlgoTime.png}
+     \includegraphics[height=4cm]{graphs/multiAlgoTime.png}
    \caption{Execution time for up to 100,000 data points of Poker Hands data set}
    \label{sfig:eval_multiAlgoTime}
  \end{subfigure}
  ~ %add desired spacing between images, if blank, line break
  \begin{subfigure}{0.45\textwidth}
    \includegraphics[height=4cm]{graphs/multiAlgoStress.png}
-    \caption{Relative stress compared to Link Force of different data sets}
+    \caption{Relative stress of each finished layout compared to Link Force of different data sets}
    \label{sfig:eval_multiAlgoStress}
  \end{subfigure}
  \caption{Comparison between different algorithms}
@@ -541,7 +683,7 @@ Figure \ref{sfig:eval_multiAlgoTime} shows the execution time and stress of the
  \label{fig:eval_Poker10k}
 \end{figure}
-As for the stress, a relative value is used for comparison. Figure \ref{sfig:eval_multiAlgoStress} shows that the Hybrid algorithm results in a layout of lower stress overall. A trend also implies that the more data points available, the better the Chalmers' and Hybrid algorithm perform.
+As for the stress, a relative value is used for comparison. Figure \ref{sfig:eval_multiAlgoStress} shows that the Hybrid algorithm results in a layout of lower stress overall. A trend also implies that the more data points available, the better the Chalmers' and Hybrid algorithm perform. In any cases, Link Force's always has the lowest stress.
 Comparing the produced layout, at 10,000 data points (figure \ref{fig:eval_Poker10k}), Hybrid can better reproduce the space between large clusters as seen in the Link Force's layout. For example, "Unrecongnized" (blue) and "One pair" (orange) have a clearer gap; "Two pairs" (green) and "Three of a kind" (red) overlap less; "Three of a kind" and "Straight" (brown) mixes together in Chalmers' layout but more separated in the Hybrid layout. However, for other classes with less data points (colored brown, purple, pink, ...), the hybrid layout fail to form a cluster, causing them to spread out even more. The same phenomenon can be observed at 100,000 data points (figure \ref{fig:eval_Poker100k}).
@@ -618,9 +760,11 @@ The area where the 1996 and Hybrid algorithm fall short is the consistency in th
 %============================
 \section{Summary}
-Each algorithm demonstrates their own strengths and weaknesses in different tests. Link Force works great and perform consistently for smaller data set. Most information visualisations on a web page will not hit any limitation of the algorithm. In addition, it allows the real-time object interactions and produces smooth animations which might be more important to most users. However, for a fully-connected spring model with above 1,000 data points, the startup time spent on distance caching start to become noticeable and the each iteration takes longer than 17ms time limit, dropping the animation below 60fps, causing visible lags and slowdown. Its memory-hungry nature also limit the ability to run on lower-end computers that a significant margin of the Internet users possess.
+Each algorithm demonstrates their own strengths and weaknesses in different tests. For smaller data sets with a few thousands data points, Link Force works great and perform consistently. Most information visualisations on a web page will not hit any limitation of the algorithm. In addition, it allows the real-time object interactions and produces smooth animations which might be more important to most users. However, for a fully-connected spring model with above 1,000 data points, the startup time spent on distance caching starts to become noticeable and the each iteration can takes longer than 17ms time limit, dropping the animation below 60fps, causing visible lags and slowdown. Its memory-hungry nature also limit the ability to run on lower-end computers that a significant margin of the Internet users possess.
-When interactivity is not a concern, performing the Hybrid layout's interpolation strategy before running the 1996 algorithm results in a better layout in a shorter amount of time. However, this is not the solution for all. The method neither work well nor consistently with smaller data set, making Link Force a better option. As for interpolation, simple brute-force method is the better choice in general. Pivot-based searching does not significantly decrease the computation time, even with a relatively large data set, and the result is less predictable.
+When bigger data sets are loaded and interactivity is not a concern, performing the Hybrid layout's interpolation strategy before running the 1996 algorithm results in a better layout in a shorter amount of time. It should be noted that, this  method does not work consistently with smaller data set, making Link Force a better option. As for interpolation, simple brute-force method is the better choice in general. Pivot-based searching does not significantly decrease the computation time, even with a relatively large data set, and the result is less predictable.
 Looking back at the older Java implementation from 2002 running on Intel\textsuperscript{\textregistered} Pentium III\cite{Algo2002}, it used to be that a 3-dimensional data set with 30,000 data points requires over 10 minutes to run, using Chalmers' algorithm and approximately 3 minutes using the Hybrid algorithm with brute-force and pivot-based parent finding\cite{Algo2003}. Comparing to now where 30,000 data point of Poker Hands, even with parameters stored as text-keyed dictionary rather than index-based array, can be visualised by 1.5 minutes with Chalmers' and 14 seconds with the Hybrid algorithm, it is clear that performance of general consumer devices have improved greatly.
 Overall, these algorithms are all valuable tools. It depends on the developer to use the right tool for the application.
 %============================
@@ -636,20 +780,36 @@ Overall, these algorithms are all valuable tools. It depends on the developer to
 \label{ch:conc}
 \section{Summary}
 In total, the following is the list of 
 \begin{itemize}
 	\item \textbf{D3 Link Force}
 	\item \textbf{d3-neighbour-sampling plug-in}
 	\item \textbf{Interpolation algorithms for hybrid layout}
 	\item \textbf{Hybrid simulation controller object for D3}
 	\item \textbf{Evaluate the impact of each interpolation parameter} (both independently and as a whole)
 	\item \textbf{Evaluate memory, time ,and final layout}
 \end{itemize}
 \section{Learning Experience}
 The project had many challenges that helped me learn of both software engineering and research practices. Working with older research papers, I have met with a lot of ambiguity in an otherwise thorough-looking description. In terms of Software Engineering, both D3 force and d3-neighbour-sampling do not have a documentation on interfaces between each component. A lot of time were spent figuring out how each object interacts with each other and what the flow of the system is. At the same time, the free and open-source license of D3 allowed me to easily access the source code to learn and customise components such as Link Force. This project also helped me expand my knowledge of client-side web application technologies and its fast development.
 As a result of evaluating this project, I believe that I have a better understanding of designing and conducting experiments on software performance. Furthermore, I also gained a valuable knowledge in JavaScript behavior on different browser and the limitation of each performance profiling tool.
 \section{Future Work}
 There are several areas of the project that was not throughly explored or could be improved. This section show several directions that can enhance the application.
 \begin{itemize}
-	\item \textbf{Incorporating Chalmers' 1996 algorithm into D3 framework}
+	\item \textbf{Incorporating Chalmers' 1996 and Hybrid interpolation algorithms into D3 framework} Currently, all the implementations are published on a publicly-accessible self-hosted Git Server as a D3 plug-in. While the hybrid model seems to make more sense as a user application implementation, the improved Chalmers' algorithm and the interpolation functions could be integrated to the core functionality of the D3 library.
 	\item \textbf{Data Exploration Test} The project focuses on overall layouts produced by each algorithm and a single Stress metric. One of the goal of MDS is to explore data, which is not been assessed. A good tool and layout should help users identify patterns and meanings behind small clusters with less effort. The project could be extended to include data investigation tools.
 	\item \textbf{Data Sets} The evaluation focuses on only 1 data set. It is possible that the algorithms could behave differently on different dataset with different dimensionality, data types and distance functions. Hence, findings in chapter \ref{ch:eval} may not apply to all.
-	\item \textbf{Optimal parameters generalisation}
+	\item \textbf{Optimal parameters generalisation} So far, only good combinations of parameters were determined for a specific data set. These values may not be universally optimal and can vary from data set to data set. Even the threshold value to stop Chalmers' algorithm also varies for different size of subset of the same Poker Hands data set. Future researches could be conducted to find the relation between these parameters to other information about the data set.
-	\item \textbf{GPU rendering}
+	\item \textbf{GPU rendering} The use of GPU for general-purpose computing (GPGPU) is gaining popularity because GPU can perform simple calculations in parallel much faster than CPU. In 2017, Khronos group have introduced WebCL\cite{WebCL}, OpenCL for web browsers. However, it have never gained any popularity and was not adopted by any browser.
-	\item \textbf{asm.js and wasm} Most implementation of JavaScript is relatively slow. asm.js gain extra performance by using only a restricted subset of JavaScript and is intended to be a compilation target from other languages such as C/C++ rather than a language to code in. Existing JavaScript engines can gain performance from asm.js' restrictions such as preallocated heap, reducing the load on the Garbage Collector while those that recognizes asm.js can also compile it to assembly ahead-of-time (AOT), eliminating the need to run code through interpreter entirely. At the moment, D3-force library is still using standard JavaScript so a significant chunk of the library have to be ported in order to be able to compare different algorithms fairly.
+
-    WebAssembly (wasm), on the other hand, is a binary format designed to run with JavaScript in the same sandbox and is even faster than JavaScript. Many major web browsers such as Firefox, Chromium, Safari, and Edge supports WebAssembly. Only recently released in March 2017, the support was not widespread and learning resources was hard to find. As a result, WebAssembly was not considered at the start of this project. %REF ME DADDY
+    Other efforts such as gpu.js\cite{gpujs} turns to using OpenGL Shading Language (GLSL) on WebGL instead. While the latest WebGL 2.0 does not support Compute Shader due to the limiting feature set of OpenGL ES 3.0\cite{WebGL2}, all of the mathematical operations used in those algorithms are supported. Following the approach, Chalmers' and the interpolation algorithms could be ported to GLSL in the future.
-	\item \textbf{Locality-Sensitive Hashing}
+	\item \textbf{asm.js and WebAssembly} Most implementation of JavaScript is relatively slow. asm.js gain extra performance by using only a restricted subset of JavaScript and is intended to be a compilation target from other languages such as C/C++ rather than a language to code in\cite{asmjs}. Existing JavaScript engines can gain performance from asm.js' restrictions such as preallocated heap, reducing the load on the Garbage Collector while those that recognizes asm.js can also compile it to assembly ahead-of-time (AOT), eliminating the need to run code through interpreter entirely. It is now supported by most modern browsers and have been proven to provide speed increase\cite{asmjsSpeed}. At the moment, D3-force library is still using standard JavaScript so a significant chunk of the library have to be ported in order to be able to compare different algorithms fairly.
-	\item \textbf{Multi-threading with HTML5 Web Workers} By nature, JavaScript is designed to be single-threaded. HTML5 allow new processes to be created and ran concurrently. These workers have isolated memory space and are not attached to the HTML document. The only way to communicate between each other is message passing. JSON objects passed are serialized by the sender and de-serialized on the other end, creating even more overhead. Due to the size of the object the program have to work with, it is estimated that the overhead will out weight the benefit and support was not implemented. %REF ME DADDY
+    
    WebAssembly (wasm), on the other hand, is a binary format designed to run with JavaScript in the same sandbox and is even faster than JavaScript\cite{WebAssembly}. Only recently released in March 2017, the support was not widespread and learning resources was hard to find. As a result, WebAssembly was not considered at the start of this project. However, as the project comes to the end, WebAssembly have gained popularity overtime and is now supported on many major web browsers such as Firefox, Chromium, Safari, and Edge.
 	\item \textbf{More-efficient hashing algorithms for parent finding} Over the decade, the field of machine learning and data mining have gained a lot of interest. Many improvements were made to solving related problems, including high-dimensional near neighbour searching. Newer algorithms such as data-dependent Locality-Sensitive Hashing\cite{LSH}, could provide a better execution time or more accurate result. Future researches can be carried out to incorporate these newer algorithm into the interpolation process of the Hybrid layout and evaluate any difference they make.
 	\item \textbf{Multi-threading with HTML5 Web Workers} By nature, JavaScript is designed to be single-threaded. HTML5 allow new processes to be created and ran concurrently. These workers have isolated memory space and are not attached to the HTML document. The only way to communicate between each other is message passing. JSON objects passed are serialized by the sender and de-serialized on the other end, creating even more overhead. Due to the size of the object the program have to work with, it is estimated that the overhead will out weight the benefit and support was not implemented.
 \end{itemize}
 %%%%%%%%%%%%%%%%
@@ -659,29 +819,26 @@ Overall, these algorithms are all valuable tools. It depends on the developer to
 %%%%%%%%%%%%%%%%
 \begin{appendices}
-\chapter{Running the Programs}
+\chapter{Running the evaluation application}
-An example of running from the command line is as follows:
+The web application can run locally by loading a single HTML file. It is located at 
 \begin{verbatim}
-      > java MaxClique BBMC1 brock200_1.clq 14400
+      d3-spring-model/examples/example-papaparsing.html
 \end{verbatim}
-This will apply $BBMC$ with $style = 1$ to the first brock200 DIMACS instance allowing 14400 seconds of cpu time.
+The data sets used can also be found at
 \begin{verbatim}
      d3-spring-model/examples/data
 \end{verbatim}
 Please note that a modern browser is required to run the application. Firefox 57 and Chrome 61 were tested, but some older versions might also works.
 The API references and instruction for building the plug-in is available in README.md file. Please note that the build scripts are written for Linux development environment and may have to be adapted for other operating system. A built JavaScript file for the plug-in is already included with the submission, hence re-building is unnecessary.
 \chapter{Generating Random Graphs}
 \label{sec:randomGraph}
 We generate Erd\'{o}s-R\"{e}nyi random graphs $G(n,p)$ where $n$ is the number of vertices and
 each edge is included in the graph with probability $p$ independent from every other edge. It produces
 a random graph in DIMACS format with vertices numbered 1 to $n$ inclusive. It can be run from the command line as follows to produce
 a clq file
 \begin{verbatim}
      > java RandomGraph 100 0.9 > 100-90-00.clq
 \end{verbatim}
 \end{appendices}
 %%%%%%%%%%%%%%%%%%%%
 %   BIBLIOGRAPHY   %
 %%%%%%%%%%%%%%%%%%%%
-\bibliographystyle{plain}
+\bibliographystyle{plainnat}
 \bibliography{l4proj}
 \end{document}