Hi I have the following fieldType that processes korean/chinese/japanese text
<fieldType name="cjk_text" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.CJKTokenizerFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.CJKTokenizerFactory"/> </analyzer> </fieldType> When I supply korean words/phrases in the query, I do get several expected Korean URLs as search results, and the my keywords are correctly highlighted in the excerpt. But for chinese & japanese I almost always draw a blank - i.e. no hits. I ran sample chinese/japanese text through 'analysis' (/search/admin/analysis.jsp) it does highlight the matches it found for the query words I supplied. But when I actually search for it (/search/admin/form.jsp) I get no hits. For chinese text I have also tried <fieldType name="cn_text" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.ChineseTokenizerFactory"/> <filter class="solr.ChineseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.ChineseTokenizerFactory"/> <filter class="solr.ChineseFilterFactory"/> </analyzer> </fieldType> Same behavior. I am using SOLR for several other languages like russian/spanish/italian/french/german etc... (each with its own tokenizers & stemmers too if available) and I do get results that correctly highlight the words I am supplying in the query. While I can't judge the meaningful quality of the results, I am satisfied that SOLR is returning documents that contain the query string(s). Not sure what the problem may be with chinese & japanese. I have updated my SOLR distribution to the latest nightly "solr-2009-06-29.zip" just in case. Has not helped of course. Thanks for your help. - ashok -- View this message in context: http://www.nabble.com/CJKTokenizerFactory-seems-to-work-for-Korea-but-not-for-China-and-Japan-tp24279927p24279927.html Sent from the Solr - User mailing list archive at Nabble.com.