On 27-Nov-07, at 8:54 AM, Eswar K wrote:
Is there any specific reason why the CJK analyzers in Solr were
chosen to be
n-gram based instead of it being a morphological analyzer which is
kind of
implemented in Google as it considered to be more effective than
the n-gram
ones?
The CJK analyzers are just wrappers of the already-available
analyzers in lucene. I suspect (but am not sure) that the core devs
aren't fluent in the issues surrounding the analysis of asian text (I
certainly am not). Any improvements in this regard would be greatly
appreciated.
-Mike