> > Hmm, I saw the comment in ClusteringDocumentList.java of Carrot2: > > /* > * If you know what query generated the documents you're about to cluster, > pass > * the query to the algorithm, which will usually increase clustering > quality. > */ > attributes.put(AttributeNames.QUERY, "data mining"); > > So I'm worried about clustering quality when Carrot2 got string > "MatchAllDocsQuery".
The query is just a hint, without the query you should still be able to get decent clusters (at least for English, we've not tested Carrot2 much with Japanese). Cheers, Staszek