Hi Stanislaw, I see. Thank you for the reference.
Kind regards, Hanjoyo On Tue, Dec 4, 2012 at 12:37 AM, Stanislaw Osinski <stanis...@osinski.name>wrote: > > I mean measuring the similarity between the document in each cluster. > > Also, difference between document on one cluster with another cluster. > > > > I saw the sample code ClusteringQualityBencmark.java > > However, I do not know how to make use of it for assessing my Solr > > Clustering performance. > > > > You'd need to write your own code for this, here are the most common > clustering quality measures you mentioned: > > > http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results > > These are meant for the general case (numeric attributes), to apply them to > texts, you'd need to use the vector representation of the documents. > > One a more general note, synthetic measures test only the document-cluster > assignments, but none take the quality of labels into account (this is > really hard to measure objectively). > > Staszek >