Re: News clustering

Iwan Hanjoyo Mon, 03 Dec 2012 17:18:39 -0800

Hi Stanislaw,

I see. Thank you for the reference.


Kind regards,

Hanjoyo

On Tue, Dec 4, 2012 at 12:37 AM, Stanislaw Osinski
<stanis...@osinski.name>wrote:

> > I mean measuring the similarity between the document in each cluster.
> > Also, difference between document on one cluster with another cluster.
> >
> > I saw the sample code ClusteringQualityBencmark.java
> > However, I do not know how to make use of it for assessing my Solr
> > Clustering performance.
> >
>
> You'd need to write your own code for this, here are the most common
> clustering quality measures you mentioned:
>
>
> http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results
>
> These are meant for the general case (numeric attributes), to apply them to
> texts, you'd need to use the vector representation of the documents.
>
> One a more general note, synthetic measures test only the document-cluster
> assignments, but none take the quality of labels into account (this is
> really hard to measure objectively).
>
> Staszek
>

Re: News clustering

Reply via email to