Re: questions about Clustering

Koji Sekiguchi Sat, 23 May 2009 06:23:22 -0700

Grant Ingersoll wrote:


On May 22, 2009, at 11:41 PM, Koji Sekiguchi wrote:

I'm thinking using clustering (SOLR-769) function for my project.

I have a couple of questions:

1. if q=*:* is requested, Carrot2 will receive "MatchAllDocsQuery"
via attributes. Is it OK?

Yes, it only clusters on the Doc List, not the Doc Set (in otherwords, it's your rows that matter)

Hmm, I saw the comment in ClusteringDocumentList.java of Carrot2:

/*

* If you know what query generated the documents you're about tocluster, pass* the query to the algorithm, which will usually increase clusteringquality.

*/
attributes.put(AttributeNames.QUERY, "data mining");

So I'm worried about clustering quality when Carrot2 got string"MatchAllDocsQuery".

2. I'd like to use it on an environment other than English, e.g.Japanese.

I've implemented Carrot2JapaneseAnalyzer (w/ Payload/ITokenType)
for this purpose.
It worked well with ClusteringDocumentList example, but didn't
work with CarrotClusteringEngine.

What I did is that I inserted the following lines(+) to
CarrotClusteringEngine:

attributes.put(AttributeNames.QUERY, query.toString());
+ attributes.put(AttributeUtils.getKey(Tokenizer.class, "analyzer"),
+ Carrot2JapaneseAnalyzer.class);

There is no runtime errors, but Carrot2 didn't use my analyzer,
it just ignored and used ExtendedWhitespaceAnalyzer (confirmed via
debugger).
Is it classloader problem? I placed my jar in ${solr.solr.home}/lib .

Hmmm, I'm not sure if the Carrot guys are on this list (they are ondev). Can you share a simple example on the JIRA issue and we candiscuss there?

Thank you for your advice. I'll post this part on SOLR-769.

Koji

Re: questions about Clustering

Reply via email to