Back

Blogs

Measures of clustering quality: a working set of axioms for Clustering Ackerman M, Ben-David S

Introduction
This is an interesting paper that has a counter statement to Klienberg’s paper in my previous blog. This paper argues that it is not inherent that a clustering technique may always satisfy the theory of impossibility. The clustering quality measures are axiomatized in this approach. A cluster quality measure function is considered as the medium to measure the quality of the clusters. The cluster quality measures are then analyzed a set of axioms are introduced for the output measures. The axioms are nothing but a medium that retains the principles expressed by Klienberg’s axioms while ensuring consistency. By this way the quality measures are considered as the technique to measure the quality of a set of clustering techniques without giving way to any off the tradeoffs as seen in Klienberg’s method.

Analysis and Inferences
The cluster quality measures is the function that is being used to measure the quality of the quality of various clustering techniques. The dataset and the clustering are taken as the arguments for this function and the output is usually a measure that effectively represents the quality of the clustering technique. The need for a cluster quality measure function can be attributed to the fact there are different clustering techniques and there can be different levels of relevance and accuracy in each of them. The main objective for a user to cluster data is always to obtain meaningful groups from the given dataset. From the output of different clustering techniques on a given data it can be difficult to estimate whether the quality of the technique is efficient. The cluster quality measure function covers that gap by obtaining information about the dataset and clustering technique and provides the quality of the clustering on the data as the output. Sometimes selection clustering models can be based on the value of the clustering quality measure. This makes it easier to choose between the various clustering methods available. Applied statistics literature has discussed about this where in cluster validity was evaluated on the grounds of quality measure when numerous clustering techniques were put in place. In this paper a theoretical basis for cluster quality evaluation was proposed and a set of axioms were considered for evaluation of quality measures. The relevance and consistency of the axioms were evaluated by showing that various instances of the clustering models satisfy these axioms. The author goes about processing the clusters with the given data and observes the quality measures. Before taking into consideration the axioms for evaluation the author goes about with the implementation in the following steps.
• The rules that were used in Klienberg’s paper were considered and their quality space transformations were taken into consideration. The transformation of Klienberg’s scale invariance axioms to the Cluster quality measures function is done as the first step.
• A mathematical margin known as relative point margin is considered next. The margin is evaluated mathematically based on the distance measures between a point and the closest center and the distance between the same point and the second closest center.
• Basically the relative point margin is evaluated as the ratio of distances between a point and the closest center and the distance between the same point and the second closest center.
• The first three axioms richness, consistency and scale invariance were considered and the rules were seen to satisfy the value of the cluster quality measure. In order to raise the bar of quality the author increases the axioms by including soundness and completeness.
• The isomorphism invariance is something the author considers as the final proof. The rule of isomorphism invariance is considered and using this the author satisfies all rules discussed thus disproving Klieberg’s claim.

Examples of cluster quality measure
The author then evaluates the quality of the clusters by considering the approaches using various clustering techniques.
• Weakest link – Evaluation on linkage based clustering was performed.
• Additive margins – Another quality for evaluation of center based clustering. Additive margin evaluates differences instead of looking at ratios.
• Computational complexity – The quality of clustering needs to be quickly computed with the cluster quality measure. The quality using the measures above can be calculated efficiently using a polynomial timeline.
• Variants in quality measures- Construction of a variety of quality measures for various clusters is being considered for proper evaluation of the quality.

Strengths
• The contradiction of the impossibility theorem was effectively backed up with the exact proofs and evaluations
• The rules that were considered here were efficient and had a definite purpose.
• The inclusion of additional properties further enhanced the claim of this paper and contradicted klienberg’s theorem
• The statements that were put forward had a simple and effective proof

Weakness
• No real time complexities in clustering techniques were taken into account, the paper was entirely influenced by Klienberg’s claims.
• Two models with different clusters are not comparable and cannot be compared or evaluated for satisfaction of the rules.
• Not many clustering techniques were taken into account
• Klienberg’s model was being contradicted there was no other purpose.

Critique Analysis and quality
The paper had a clear and concise thought process. The ideas expressed were simple and had sound cognition. The results could have been further enhanced by considering various clustering techniques. Visual representation of inferences could have enhanced the quality of the paper. Scope for future research is minimal as this paper just contradicts with Klienberg’s findings.

References
[1] Arrow, K.J., “A Difficulty in the Concept of Social Welfare", Journal of Political Economy, 1950
[2] Kleinberg J., "The impossibility theorem for clustering", Advances in Neural Information Processing Systems, NIPS 2002
[3] Ackerman M, Ben-David S., "Measures of clustering quality: a working set of axioms for clustering", Advances in Neural Information Processing Systems, NIPS 2008

For comments leave a message below and I'll get back to you

Thanks for visiting my blog