Hi Tomoko, Thank you for your reply.
> If you need to perform partial (prefix) match for **only English words**, > you can create a separate field that keeps only English words (I've never > tried that, but might be possible by PatternTokenizerFactory or other > tokenizer/filter chains...,) and apply EdgeNGramFilterFactory to the field. This means it is better to have 2 separate fields for English and Chinese words? Not quite sure what you mean by that. Regards, Edwin On 25 October 2015 at 11:42, Tomoko Uchida <tomoko.uchida.1...@gmail.com> wrote: > > I have rich-text documents that are in both English and Chinese, and > > currently I have EdgeNGramFilterFactory enabled during indexing, as I > need > > it for partial matching for English words. But this means it will also > > break up each of the Chinese characters into different tokens. > > EdgeNGramFilterFactory creates sub-strings (prefixes) from each token. Its > behavior is independent of language. > If you need to perform partial (prefix) match for **only English words**, > you can create a separate field that keeps only English words (I've never > tried that, but might be possible by PatternTokenizerFactory or other > tokenizer/filter chains...,) and apply EdgeNGramFilterFactory to the field. > > Hope it helps, > Tomoko > > 2015-10-23 13:04 GMT+09:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>: > > > Hi, > > > > Would like to check, is it good to use EdgeNGramFilterFactory for indexes > > that contains Chinese characters? > > Will it affect the accuracy of the search for Chinese words? > > > > I have rich-text documents that are in both English and Chinese, and > > currently I have EdgeNGramFilterFactory enabled during indexing, as I > need > > it for partial matching for English words. But this means it will also > > break up each of the Chinese characters into different tokens. > > > > I'm using the HMMChineseTokenizerFactory for my tokenizer. > > > > Thank you. > > > > Regards, > > Edwin > > >