Grant Ingersoll wrote:
On May 22, 2009, at 11:41 PM, Koji Sekiguchi wrote:
I'm thinking using clustering (SOLR-769) function for my project.
I have a couple of questions:
1. if q=*:* is requested, Carrot2 will receive "MatchAllDocsQuery"
via attributes. Is it OK?
Yes, it only clusters on the Doc List, not the Doc Set (in other
words, it's your rows that matter)
Hmm, I saw the comment in ClusteringDocumentList.java of Carrot2:
/*
* If you know what query generated the documents you're about to
cluster, pass
* the query to the algorithm, which will usually increase clustering
quality.
*/
attributes.put(AttributeNames.QUERY, "data mining");
So I'm worried about clustering quality when Carrot2 got string
"MatchAllDocsQuery".
2. I'd like to use it on an environment other than English, e.g.
Japanese.
I've implemented Carrot2JapaneseAnalyzer (w/ Payload/ITokenType)
for this purpose.
It worked well with ClusteringDocumentList example, but didn't
work with CarrotClusteringEngine.
What I did is that I inserted the following lines(+) to
CarrotClusteringEngine:
attributes.put(AttributeNames.QUERY, query.toString());
+ attributes.put(AttributeUtils.getKey(Tokenizer.class, "analyzer"),
+ Carrot2JapaneseAnalyzer.class);
There is no runtime errors, but Carrot2 didn't use my analyzer,
it just ignored and used ExtendedWhitespaceAnalyzer (confirmed via
debugger).
Is it classloader problem? I placed my jar in ${solr.solr.home}/lib .
Hmmm, I'm not sure if the Carrot guys are on this list (they are on
dev). Can you share a simple example on the JIRA issue and we can
discuss there?
Thank you for your advice. I'll post this part on SOLR-769.
Koji