Hi

I have the following fieldType that processes korean/chinese/japanese text

<fieldType name="cjk_text" class="solr.TextField">
      <analyzer type="index">
        <tokenizer class="solr.CJKTokenizerFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.CJKTokenizerFactory"/>
      </analyzer>
</fieldType>

When I supply korean words/phrases in the query, I do get several expected
Korean URLs as search results, and the my keywords are correctly highlighted
in the excerpt. But for chinese & japanese I almost always draw a blank -
i.e. no hits.

I ran sample chinese/japanese text through 'analysis'
(/search/admin/analysis.jsp) it does highlight the matches it found for the
query words I supplied. But when I actually search for it
(/search/admin/form.jsp) I get no hits.

For chinese text I have also tried

<fieldType name="cn_text" class="solr.TextField">
      <analyzer type="index">
        <tokenizer class="solr.ChineseTokenizerFactory"/>
        <filter class="solr.ChineseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.ChineseTokenizerFactory"/>
        <filter class="solr.ChineseFilterFactory"/>
      </analyzer>
 </fieldType>

Same behavior.

I am using SOLR for several other languages like
russian/spanish/italian/french/german etc... (each with its own tokenizers &
stemmers too if available) and I do get results that correctly highlight the
words I am supplying in the query. While I can't judge the meaningful
quality of the results, I am satisfied that SOLR is returning documents that
contain the query string(s).

Not sure what the problem may be with chinese & japanese. I have updated my
SOLR distribution to the latest nightly "solr-2009-06-29.zip" just in case.
Has not helped of course. Thanks for your help. - ashok
-- 
View this message in context: 
http://www.nabble.com/CJKTokenizerFactory-seems-to-work-for-Korea-but-not-for-China-and-Japan-tp24279927p24279927.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to