EdgeNGramFilterFactory for Chinese characters

Zheng Lin Edwin Yeo Thu, 22 Oct 2015 21:05:27 -0700

Hi,

Would like to check, is it good to use EdgeNGramFilterFactory for indexes
that contains Chinese characters?
Will it affect the accuracy of the search for Chinese words?


I have rich-text documents that are in both English and Chinese, and
currently I have EdgeNGramFilterFactory enabled during indexing, as I need
it for partial matching for English words. But this means it will also
break up each of the Chinese characters into different tokens.

I'm using the HMMChineseTokenizerFactory for my tokenizer.

Thank you.

Regards,
Edwin

EdgeNGramFilterFactory for Chinese characters

Reply via email to