Update in-progress eval 2

2018-03-02 13:48:39 +00:00
parent 9182c0fb13
commit 3bdd68dffe
8 changed files with 86 additions and 22 deletions
--- a/graphs/hybridParams_2ndTime_100k.png
+++ b/graphs/hybridParams_2ndTime_100k.png
--- a/graphs/hybridParams_2ndTime_10k.png
+++ b/graphs/hybridParams_2ndTime_10k.png
--- a/graphs/hybridParams_stress_10k.png
+++ b/graphs/hybridParams_stress_10k.png
--- a/graphs/hybridParams_totalTime_100k.png
+++ b/graphs/hybridParams_totalTime_100k.png
--- a/graphs/hybridParams_totalTime_10k.png
+++ b/graphs/hybridParams_totalTime_10k.png
--- a/graphs/neighbourRam.png
+++ b/graphs/neighbourRam.png
--- a/graphs/ram.png
+++ b/graphs/ram.png
--- a/l4proj.tex
+++ b/l4proj.tex
@@ -64,7 +64,7 @@ D3 library, which will be described in section \ref{ssec:d3design}, have a sever
 \end{figure}
 The link force algorithm is inefficient. In each time step (iteration), a calculation have to be done for each pair of nodes connected with a link. This means that for our use case with a fully-connected graph (where every node is connected to every other node) of $N$ nodes, the algorithm will have to perform $N(N-1)$ force calculations per iteration, essentially $O(N^2)$. It is also believed that the number of iterations required to create a good layout is proportional to the size of the data set, hence the total time complexity of $O(N^3)$.
-The model also cache the desired distance of each link in memory to improve the speed across many iterations. While this greatly reduces the number of calls to the distance-calculating function. the memory complexity also increases to $O(N^2)$. Because the JavaScript memory heap is limited, it runs out of memory when trying to process a fully-connected graph of a few thousand data points.
+The model also cache the desired distance of each link in memory to improve the speed across many iterations. While this greatly reduces the number of calls to the distance-calculating function. the memory complexity also increases to $O(N^2)$. Because the JavaScript memory heap is limited, it runs out of memory when trying to process a fully-connected graph of more than a ten thousands data points.
 \section{Chalmers' 1996 algorithm}
 In 1996, Matthew Chalmers proposed a technique to reduce the time complexity down to $O(N^2)$, which is a massive improvement over link force's $O(N^3)$ as described in section \ref{sec:linkbg}, at the cost of accuracy. This is done by reducing the number of spring force calculations per iterations using random samples\cite{Algo1996}.
@@ -139,6 +139,7 @@ The complexity of the preprocessing stage is $O(\sqrt{N}k)$. For query, the aver
 With this method, the parent found is not guaranteed to be the closest point. Prior evaluations have concluded that the accuracy is high enough to produce good result.
 \section{Performance Metrics}
 \label{sec:bg_metrics}
 To compare different algorithms they have to be tested against the same set of performance metric. During the development, a number of metrics were used to objectively judge the resulting graph and computation requirement. The evaluation process in section \ref{ch:eval} will focuses on the following metrics.
 \begin{itemize}
@@ -189,8 +190,12 @@ something
 \section{Algorithms}
 \subsection{Link force}
 \label{sec:imp_linkForce}
 Reduce mem usage, Section \ref{ssec:eval_ram} for details.
 %============================
 \subsection{Chalmers' 1996}
 Force scaling
 Tried caching
 %============================
 \subsection{Hybrid Layout}
 %============================
@@ -227,7 +232,7 @@ The Poker Hands is another classification dataset containing possible hands of 5
 \begin{figure}
 \centering
 \includegraphics[height=6cm]{layout/Link10000Stable_crop.png}
-\caption{A subset of 10,000 data points of the Poker Hands data set, illustrated by D3 Link force which should produces the most accurate layout.}
+\caption{A subset of 10,000 data points of the Poker Hands data set, visualised by Link force which should produces the most accurate layout.}
 \label{fig:eval_idealSample}
 \end{figure}
@@ -242,14 +247,15 @@ As for web browser, the official 64-bit build of Google Chrome 61.0.3163.79 is u
 Other unrelated parameters have to also be controlled as much as possible. The starting position of all nodes are locked at $(0,0)$ and the simulation's velocity decay is set at default of $0.4$, mimicking air friction. Alpha, a decaying value used for artificially slowing down or freezing the system over time, is also kept at 1 to keep the springs' forces in full effect. The web page is also refreshed after every run to make sure that everything, including uncontrollable aspects such as JavaScript heap and the behavior of the browser's garbage collector, have been properly reset.
 \subsection{Termination criteria}
-Both D3 Link force and the 1996 algorithm create a layout that stabilises over time. In D3, calculations are performed for a predefined number of iterations. This have a drawback of having to select an appropriate number. Choosing the number too high means that execution time is wasted calculating minute details with no visible change to the layout while the opposite can results in a bad layout.
+\label{ssec:eval_termCriteria}
 Both Link force and the 1996 algorithm create a layout that stabilises over time. In D3, calculations are performed for a predefined number of iterations. This have a drawback of having to select an appropriate number. Choosing the number too high means that execution time is wasted calculating minute details with no visible change to the layout while the opposite can results in a bad layout.
 Determining the constant number can be problematic, considering that each algorithm may stabilise after different number of iterations, especially when the interpolation result can vary greatly from run-to-run.
-An alternative method is to stop when a condition is met. One of such condition purposed is the difference in velocity of the system between iterations\cite{Algo2002}. In other word, once the amount of force applied in that iteration is lower than a scalar threshold, the calculation may stop. Taking note of stress and average force applied over multiple iterations as illustrated in figure \ref{fig:eval_stressVeloOverTime}, it is clear that D3 Link force converges zero while 1996 algorithms reaches and fluctuate around a constant. Because the $Samples$ set keeps changing, the system will not reach a state where every spring forces cancel each other out nearly completely. This is also reflected in the animation where every nodes keep wiggling but the overall layout remains constant. It can also be seen that stress of each layout converges a minimal value as the average force converges a constant, indicating that the best layout from each algorithm can be obtained once the system stabilizes.
+An alternative method is to stop when a condition is met. One of such condition purposed is the difference in velocity of the system between iterations\cite{Algo2002}. In other word, once the amount of force applied in that iteration is lower than a scalar threshold, the calculation may stop. Taking note of stress and average force applied over multiple iterations as illustrated in figure \ref{fig:eval_stressVeloOverTime}, it is clear that Link force converges zero while 1996 algorithms reaches and fluctuate around a constant. Because the $Samples$ set keeps changing, the system will not reach a state where every spring forces cancel each other out nearly completely. This is also reflected in the animation where every nodes keep wiggling but the overall layout remains constant. It can also be seen that stress of each layout converges a minimal value as the average force converges a constant, indicating that the best layout from each algorithm can be obtained once the system stabilizes.
-Since stress is takes too long to calculate iteration, the termination criteria is steeled with average force applied. This criteria is used for all 3 algorithms. The cut-off constant is then manually selected for each algorithm for every subset of Poker Hands. D3 Link force's threshold is a value that is very low that stress is stabilized and does not make any visible changes while 1996's is the lowest possible value that is reached most of the time.
+Since stress is takes too long to calculate iteration, the termination criteria is steeled with average force applied. This criteria is used for all 3 algorithms. The cut-off constant is then manually selected for each algorithm for every subset used. Link force's threshold is a value that is very low that stress is stabilized and does not make any visible changes while 1996's is the lowest possible value that is reached most of the time.
-By selecting this termination condition, the goal of the last phase of the Hybrid Layout algorithm is changed. Rather than performing 1996 algorithm over the whole dataset to correct interpolation error, the interpolation phase's role is to help the final phase reaches stability quicker. Thus, parameters of the interpolation phase can not be evaluated on their own. Taking more time to produce a better interpolation result may or may not effect the number of iterations in the final phase, creating the need to balance between time spent and saved by interpolation.
+By selecting this termination condition, the goal of the last phase of the Hybrid Layout algorithm is changed. Rather than performing the 1996 algorithm over the whole dataset to correct interpolation errors, the interpolation phase's role is to help the final phase reaches stability quicker. Thus, parameters of the interpolation phase can not be evaluated on their own. Taking more time to produce a better interpolation result may or may not effect the number of iterations in the final phase, creating the need to balance between time spent and saved by interpolation.
 \begin{figure}
 \centering
@@ -260,11 +266,21 @@ By selecting this termination condition, the goal of the last phase of the Hybri
 %============================
 \subsection{Selecting Parameters}
 \label{ssec:eval_selectParams}
 Some of the algorithms have variables that are predefined constant numbers. Choosing the wrong values could lead the algorithm to produce bad results or takes unnecessarily long computation time and memory. To compare each algorithm fairly, an optimal set of parameters have to be chosen for each.
-D3 Link force have no adjustable parameter for the use case.
+The 1996 algorithm have two adjustable parameters: $Neighbours_{size}$, $Samples_{size}$.
-The 1996 algorithm have two parameters: $Neighbours_{size}$, $Samples_{size}$.
+According to previous evaluations\cite{LastYear}\cite{Algo2002}, favorable layout could be achieved with values as low as $10$ for both variables. Preliminary testings seems to confirm the findings and the value is selected for the experiments. In the other hand, Link force have no adjustable parameter whatsoever so no attention is required.
-According to previous evaluations\cite{LastYear}\cite{Algo2002}, favorable layout could be achieved with values as low as $10$ for both variables. Preliminary testings seems to suggest so and the value is selected for the experiments.
+
 \begin{figure}
 \centering
 \includegraphics[height=5cm]{graphs/hitrate_graph1.png}
 \includegraphics[height=5cm]{graphs/hitrate_graph2.png}
 \caption{Graphs showing accuracy of pivot-based searching between $k = $ 1, 3, 6, and 10. The left box-plot graph shows the percentage across 5 different runs (higher and more consistent is better). The right shows the high-dimensional distance ratio between the candidate parent chosen by pivot-based searching and the best parent (closer to 1 is better). For instance, if the best parent is 1 unit away from the querying node, a ratio of 1.3 means that the candidate parent is 1.3 unit away.}
 \label{fig:eval_pivotHits}
 \end{figure}
 Hybrid layout have multiple parameters during the interpolation phase. For the parent-finding stage, there is a choice of weather to use brute-force or pivot-based searching method. In case of pivot-based, the number of pivots ($k$) have to also be chosen. Experiments have been run to find the accuracy of pivot-based searching, starting from $1$ pivot to determine reasonable numbers to use in subsequence experiments. However, as shown in figure \ref{fig:eval_interpVariations}, the randomly selected $S$ set (the $\sqrt{N}$ samples used in the first stage) can greatly affect the interpolation result, especially with smaller data set with many small clusters. Therefore, each tests have to be run multiple times to generalise the result. As illustrated in figure \ref{fig:eval_pivotHits}, the more pivots used, the higher accuracy and consistency. The diminishing returns can be observed at around 6 to 10 pivots, depending on number of data point. Hence, higher number of pivots are no longer considered for the experiment.
 \begin{figure}
 \centering
@@ -273,17 +289,7 @@ According to previous evaluations\cite{LastYear}\cite{Algo2002}, favorable layou
 \label{fig:eval_interpVariations}
 \end{figure}
-Hybrid layout have multiple parameters during the interpolation phase. For the parent-finding stage, there is a choice of weather to use brute-force or pivot-based searching method. In case of pivot-based, the number of pivots ($k$) have to also be chosen. Experiments have been run to find the accuracy of pivot-based searching, starting from $1$ pivot to determine reasonable numbers to use in subsequence experiments. However, as shown in figure \ref{fig:eval_interpVariations}, the randomly selected $S$ set (the $\sqrt{N}$ samples used in the first stage with the 1996 algorithm) can greatly affect the interpolation result, especially with smaller data set with many small clusters. Therefore, each tests have to be run multiple times to generalise the result. As illustrated in figure \ref{fig:eval_pivotHits}, the more pivots, the higher accuracy and consistency. The diminishing returns can already be observed at around 6 to 10 pivots, depending on number of data point. Hence, higher number of pivots are no longer considered. The number of pivots to use in subsequence experiments will only be 1, 3, 6, and 10.
+Finally, the last step of interpolation is to refine placement for a constant number of times. Preliminary testings shows that while this step can clean up interpolation artifacts as shown in figure \ref{fig:eval_refineCompare}, desirable layout can not be obtained now matter how many steps the refinement takes. Hence, the 1996 algorithm has to be run over the entire data set after the interpolation phase. For the rest of the experiment, only two values, 0 and 20 are arbitrarily selected, representing with and without interpolation artifacts cleaning.
 \begin{figure}
 \centering
 \includegraphics[height=5cm]{graphs/hitrate_graph1.png}
 \includegraphics[height=5cm]{graphs/hitrate_graph2.png}
 \caption{Graphs showing accuracy of pivot-based searching between $k = $ 1, 3, 6, and 10. On the left is the percentage in which pivot-based searching returns the same result as brute-force searching from 5 different runs (higher is better). On the right is the high-dimensional distance ratio between the candidate parent chosen by pivot-based searching and the best parent as found by the brute-force method (closer to 1 is better). For example, if the best parent is 1 unit away from the querying node, a ratio of 1.3 means that the candidate parent is 1.3 unit away. The subset used has 100,000 data points.}
 \label{fig:eval_pivotHits}
 \end{figure}
 Finally, the last step of interpolation is to refine placement for a constant number of times. Preliminary testings shows that while this step can clean up interpolation artifacts as shown in figure \ref{fig:eval_refineCompare}, desirable layout can not be obtained now matter how many steps the refinement takes, hence the 1996 algorithm has to be run over the entire data set after the interpolation phase. For the rest of the experiment, only two values, 0 and 20 are used, representing with and without interpolation artifacts cleaning.
 \begin{figure}
 \centering
@@ -294,11 +300,69 @@ Finally, the last step of interpolation is to refine placement for a constant nu
 %============================
 \subsection{Performance metrics}
-% RAM, Time, Stress, Layout
+As discussed in section \ref{sec:bg_metrics}, there are three main criteria to evaluate each algorithm: execution time, memory consumption, and the produced layout. While Stress is a good metric to judge the quality of a layout, it does not necessary means that layouts of the same stress are equally as good for data exploration. Thus, the looks of the product itself have to also be compared. Since both the 1996 and Hybrid algorithms have the goal of mimicking the Force Link's result while cutting cost as much as possible, the layout of Force Link will be used as a baseline for comparison. The closer the layout from the other algorithms is to the baseline, the better.
 %============================
 \section{Results}
 \subsection{Memory usage}
 \label{ssec:eval_ram}
 Google Chrome comes with the performance profiling tools, allowing users to measure JavaScript heap usage. While it is straightforward to measure the usage of Link Force, the 1996 algorithm causes problems with the garbage collector. Because the $Samples$ set and, to a certain degree, $Neighbours$ set is reconstructed at every iterations, a lot of new memory spaces are allocated and the old ones are left unreachable, waiting to be reclaimed. As a result, the JS heap usage keeps increasing until the GC runs (figure \ref{fig:eval_neighbourRam}) even though the actual usage is theoretically constant across multiple iterations. Although GC is designed to be only be run automatically by the JavaScript engine, Google Chrome allow it to be manually called in the profiling tool. For this experiment, GC will be manually called periodically during part of the run. The usage immediately after GC is then recorded and will be used for comparison. The peak before GC automatically gets invoked is also noted.
 \begin{figure}
 \centering
 \includegraphics[height=3cm]{graphs/neighbourRam.png}
 \caption{Fluctuating JavaScript usage due to frequent memory allocation of the 1996 algorithm. GC is manually invoked every second in the first half and automatically in the later.}
 \label{fig:eval_neighbourRam}
 \end{figure}
 The hybrid layout has multiple phases, each with different theoretical memory complexity. With the final phase essentially being the 1996 algorithm, the memory requirement is expected to be equal or higher. As far as the interpolation phase is concerned, the buckets storage for pivot-based finding requires the most memory at $k(S_{Size})=O(\sqrt{N})$. Compared to $N(Neighbours_{size}+Samples_{size}) = O(N)$ of the 1996 algorithm, final stage should defines the overall memory requirement for the layout.
 \begin{figure}
 \centering
 \includegraphics[height=6cm]{graphs/ram.png}
 \caption{Comparison of memory usage of each algorithm}
 \label{fig:eval_ram}
 \end{figure}
 The comparison have been made between the 3 algorithms with hybrid layout running 10 pivots to represent the worst case scenario for interpolation. Rendering is turned off to minimize the impact due to DOM elements manipulation\cite{LastYear}. Figure \ref{fig:eval_ram} verifies the hypothesis above. The modified Link Force, which use less memory compared to the D3's implementation (section \ref{sec:imp_linkForce}), perform a lot worse than the 1996 algorithm, even with just automatic garbage collection. The difference in the base memory usage between the 1996 algorithm and the final stage of Hybrid layout is also with the margin of error, confirming that they both have the same memory requirement.
 Due to JavaScript limitation, Force Link crashes the browser tab at 50,000 data points before any spring force is calculated, failing the test entirely. In contrast, the 1996 algorithm (and the hybrid) can process with as much as 470,000 data points. 600,000 points is also possible but the 8GB memory available on the evaluation computers were used up, causing thrashing and slowing down the entire machine. Interestingly, unlike Force Link, the tab does not crash despite heavy paging, suggesting that memory requirement is not the only limiting factor in play.
 All in all, since a desirable result can not be obtained from Hybrid algorithm if the final stage is skipped, there is no benefit in term of memory usage from using the Hybrid layout, compared to the 1996 algorithm. Both of them have a lot smaller memory footprint compared to Force Link and can work on a lot more data points on the same hardware constraint.
 %============================
 \subsection{Different Parameters of the Hybrid Layout}
 In section \ref{ssec:eval_termCriteria}, it have been concluded that the value of each parameter can not be evaluated on its own. Based on findings discussed in section \ref{ssec:eval_selectParams}, 10 different combinations of interpolation parameters were chosen: Brute force and 1, 3, 6, and 10 pivots, each with and without refinement at the end. Due to possible variations from the sample set $S$, each experiment is also performed 5 times.
 \begin{figure}
 \centering
 \includegraphics[height=5cm]{graphs/hybridParams_2ndTime_10k.png}
 \includegraphics[height=5cm]{graphs/hybridParams_stress_10k.png}
 \includegraphics[height=5cm]{graphs/hybridParams_totalTime_10k.png}
 \caption{Comparison of different interpolation parameters of the hybrid layout at 10,000 data points.}
 \label{fig:eval_hybridParams10k}
 \end{figure}
 \begin{figure}
 \centering
 \includegraphics[height=5cm]{graphs/hybridParams_2ndTime_100k.png}
 \includegraphics[height=5cm]{graphs/hybridParams_totalTime_100k.png}
 \caption{Comparison of different interpolation parameters of the hybrid layout at 100,000 data points.}
 \label{fig:eval_hybridParams100k}
 \end{figure}
 Time of Phase 2: too many pivot may be slower than bruteforce. Refinement does not take much longer. Overall time depends only on the no of iterations in the last phase
 Different args, look at affect on the no of iterations in the last phase.
 Looks: while pivot 1 is fast, bad layout.
 Conclusion
 %============================
 \subsection{Comparison between the 3 algorithms with 10,000 data points}
 %============================
 %==============================================================================
 %%%%%%%%%%%%%%%%
@@ -320,7 +384,7 @@ Finally, the last step of interpolation is to refine placement for a constant nu
 	\item \textbf{Data Sets} The evaluation focuses on only 1 data set. It is possible that the algorithms could behave differently on different dataset with different dimensionality, data types and distance functions. Hence, findings in chapter \ref{ch:eval} may not apply to all.
 	\item \textbf{Optimal parameters generalisation}
 	\item \textbf{GPU rendering}
-	\item \textbf{asm.js and wasm} Most implementation of JavaScript is relatively slow. asm.js gain extra performance by using only a restricted subset of JavaScript and is intended to be a compilation target from other languages such as C/C++ rather than a language to code in. Existing JavaScript engines can run asm.js while those that recognizes asm.js can also compile it to assembly ahead-of-time (AOT), eliminating the need to run code through interpreter. At the moment, D3-force library is still using standard JavaScript so a significant chunk of the library have to be ported in order to be able to compare different algorithms fairly.
+	\item \textbf{asm.js and wasm} Most implementation of JavaScript is relatively slow. asm.js gain extra performance by using only a restricted subset of JavaScript and is intended to be a compilation target from other languages such as C/C++ rather than a language to code in. Existing JavaScript engines can gain performance from asm.js' restrictions such as preallocated heap, reducing the load on the Garbage Collector while those that recognizes asm.js can also compile it to assembly ahead-of-time (AOT), eliminating the need to run code through interpreter entirely. At the moment, D3-force library is still using standard JavaScript so a significant chunk of the library have to be ported in order to be able to compare different algorithms fairly.
    WebAssembly (wasm), on the other hand, is a binary format designed to run with JavaScript in the same sandbox and is even faster than JavaScript. Many major web browsers such as Firefox, Chromium, Safari, and Edge supports WebAssembly. Only recently released in March 2017, the support was not widespread and learning resources was hard to find. As a result, WebAssembly was not considered at the start of this project. %REF ME DADDY
 	\item \textbf{Locality-Sensitive Hashing}
 	\item \textbf{Multi-threading with HTML5 Web Workers} By nature, JavaScript is designed to be single-threaded. HTML5 allow new processes to be created and ran concurrently. These workers have isolated memory space and are not attached to the HTML document. The only way to communicate between each other is message passing. JSON objects passed are serialized by the sender and de-serialized on the other end, creating even more overhead. Due to the size of the object the program have to work with, it is estimated that the overhead will out weight the benefit and support was not implemented. %REF ME DADDY