Grant Ingersoll wrote:

On May 22, 2009, at 11:41 PM, Koji Sekiguchi wrote:

I'm thinking using clustering (SOLR-769) function for my project.

I have a couple of questions:

1. if q=*:* is requested, Carrot2 will receive "MatchAllDocsQuery"
via attributes. Is it OK?

Yes, it only clusters on the Doc List, not the Doc Set (in other words, it's your rows that matter)

Hmm, I saw the comment in ClusteringDocumentList.java of Carrot2:

/*
* If you know what query generated the documents you're about to cluster, pass * the query to the algorithm, which will usually increase clustering quality.
*/
attributes.put(AttributeNames.QUERY, "data mining");

So I'm worried about clustering quality when Carrot2 got string "MatchAllDocsQuery".


2. I'd like to use it on an environment other than English, e.g. Japanese.
I've implemented Carrot2JapaneseAnalyzer (w/ Payload/ITokenType)
for this purpose.
It worked well with ClusteringDocumentList example, but didn't
work with CarrotClusteringEngine.

What I did is that I inserted the following lines(+) to
CarrotClusteringEngine:

attributes.put(AttributeNames.QUERY, query.toString());
+ attributes.put(AttributeUtils.getKey(Tokenizer.class, "analyzer"),
+ Carrot2JapaneseAnalyzer.class);

There is no runtime errors, but Carrot2 didn't use my analyzer,
it just ignored and used ExtendedWhitespaceAnalyzer (confirmed via
debugger).
Is it classloader problem? I placed my jar in ${solr.solr.home}/lib .


Hmmm, I'm not sure if the Carrot guys are on this list (they are on dev). Can you share a simple example on the JIRA issue and we can discuss there?

Thank you for your advice. I'll post this part on SOLR-769.

Koji


Reply via email to