Hierarchical Agglomerative Clustering (HAC) is a clustering algoritm. When starts, every document is a unique cluster. At each iteration two of the closest clusters are joined in one. This repeats until exists only one cluster with all documents.
How does this program work?
1. Insert vectors
- Manually
- From file
- txt
- xml
- Single link: maximum similarity
- Complete link: minimum similarity
- Centroid: average inter-similarity
- Average: average of all similarities
- Euclidean distance
- Manhattan distance
You have to fill in fields with vectors. The number of vectors must be at least three.
You have to give only txt or xml file format. For convenience, you can download the example of file you want to see the template.
The file should consist of all the vectors, each one of them must be in a line.
This file should consist not only of all the vectors, but also: the number of execution iterations, the distance method (Euclidean or Manhattan) and the algorithm variant (Single link, Complete link, Centroid or Average). If the last is Centroid, the mode of calculations must be given (With/Without calculations of centroids).
Each vector corresponds to a unique page. For example, you can
give '1, 4, 5, 7'. Duplicates are deleted!
After you fill out everything you want, click on tab '2. Insert parameters'.
2. Insert parameters
This step is skipped by uploading an xml file!
At this point, all the parameters of the algorithm must be given except for the documents that were given before.
Field 'Execution iterations' means how many iterations you want the algorithm to execute and ranges from 1 to 10.
Field 'Algorithm variant' is about the variations in the definition of "closest clusters".
sim(ci, cj) = maxsim(x,y)
sim(ci, cj) = minsim(x,y)
Field 'Distance method' has the method that will be used for computing the distance between vectors.
If A(x1,y1,z1) and B(x2,y2,z2), then the Euclidean distance shown below.
Dist(A, B) = √ (x2-x1)2 + (y2-y1)2 + (z2-z1)2
If A(x1,y1,z1) and B(x2,y2,z2), then the Manhattan distance shown below.
Dist(A, B) = |5-1| + |6-2| + |7-3|
After you fill out everything you want, click on tab '3. Results'.
3. Results
You see the results.