Re: questions about Clustering

Stanislaw Osinski Sat, 23 May 2009 06:26:17 -0700

>
> 1. if q=*:* is requested, Carrot2 will receive "MatchAllDocsQuery"
>> via attributes. Is it OK?
>>
>
> Yes, it only clusters on the Doc List, not the Doc Set (in other words,
> it's your rows that matter)



Just to add to that: Carrot2 should be able to cluster up to ~1000 search
results, but by design it won't be able to process significantly more
documents than that. The reason is that Carrot2 is a search results
clustering engine and performs all processing in-memory.

 2. I'd like to use it on an environment other than English, e.g. Japanese.
>> I've implemented Carrot2JapaneseAnalyzer (w/ Payload/ITokenType)
>> for this purpose.
>> It worked well with ClusteringDocumentList example, but didn't
>> work with CarrotClusteringEngine.
>>
>> What I did is that I inserted the following lines(+) to
>> CarrotClusteringEngine:
>>
>> attributes.put(AttributeNames.QUERY, query.toString());
>> + attributes.put(AttributeUtils.getKey(Tokenizer.class, "analyzer"),
>> + Carrot2JapaneseAnalyzer.class);
>>
>> There is no runtime errors, but Carrot2 didn't use my analyzer,
>> it just ignored and used ExtendedWhitespaceAnalyzer (confirmed via
>> debugger).
>> Is it classloader problem? I placed my jar in ${solr.solr.home}/lib .
>>
>
>
> Hmmm, I'm not sure if the Carrot guys are on this list (they are on dev).
>  Can you share a simple example on the JIRA issue and we can discuss there?


Yep, we're here too :-)

The catch with analyzer is that this specific attribute is an
initialization-time attribute, so you need to add it to the initAttributes
map in the init() method of CarrotClusteringEngine.

Please let me know if this solves the problem. If not, I'll investigate
further.

Cheers,

Staszek

Re: questions about Clustering

Reply via email to