HAC

HAC

Hierarchical Agglomerative Clustering (HAC) is a clustering algoritm. When starts, every document is a unique cluster. At each iteration two of the closest clusters are joined in one. This repeats until exists only one cluster with all documents.



How does this program work?

1. Insert vectors

  • Manually
  • You have to fill in fields with vectors. The number of vectors must be at least three.

  • From file
  • You have to give only txt or xml file format. For convenience, you can download the example of file you want to see the template.

    • txt
    • The file should consist of all the vectors, each one of them must be in a line.

    • xml
    • This file should consist not only of all the vectors, but also: the number of execution iterations, the distance method (Euclidean or Manhattan) and the algorithm variant (Single link, Complete link, Centroid or Average). If the last is Centroid, the mode of calculations must be given (With/Without calculations of centroids).

Each vector corresponds to a unique page. For example, you can give '1, 4, 5, 7'. Duplicates are deleted!
After you fill out everything you want, click on tab '2. Insert parameters'.



2. Insert parameters

This step is skipped by uploading an xml file!

At this point, all the parameters of the algorithm must be given except for the documents that were given before.

Field 'Execution iterations' means how many iterations you want the algorithm to execute and ranges from 1 to 10.

Field 'Algorithm variant' is about the variations in the definition of "closest clusters".

  • Single link: maximum similarity
  • sim(ci, cj) = maxsim(x,y)

  • Complete link: minimum similarity
  • sim(ci, cj) = minsim(x,y)

  • Centroid: average inter-similarity

  • Average: average of all similarities

Field 'Distance method' has the method that will be used for computing the distance between vectors.

  • Euclidean distance
  • If A(x1,y1,z1) and B(x2,y2,z2), then the Euclidean distance shown below.
    Dist(A, B) = √ (x2-x1)2 + (y2-y1)2 + (z2-z1)2 

  • Manhattan distance
  • If A(x1,y1,z1) and B(x2,y2,z2), then the Manhattan distance shown below.
    Dist(A, B) = |5-1| + |6-2| + |7-3|

After you fill out everything you want, click on tab '3. Results'.



3. Results

You see the results.