Hi Tomoko,

Thank you for your reply.

> If you need to perform partial (prefix) match for **only English words**,
> you can create a separate field that keeps only English words (I've never
> tried that, but might be possible by PatternTokenizerFactory or other
> tokenizer/filter chains...,) and apply EdgeNGramFilterFactory to the
field.

This means it is better to have 2 separate fields for English and Chinese
words?
Not quite sure what you mean by that.

Regards,
Edwin



On 25 October 2015 at 11:42, Tomoko Uchida <tomoko.uchida.1...@gmail.com>
wrote:

> > I have rich-text documents that are in both English and Chinese, and
> > currently I have EdgeNGramFilterFactory enabled during indexing, as I
> need
> > it for partial matching for English words. But this means it will also
> > break up each of the Chinese characters into different tokens.
>
> EdgeNGramFilterFactory creates sub-strings (prefixes) from each token. Its
> behavior is independent of language.
> If you need to perform partial (prefix) match for **only English words**,
> you can create a separate field that keeps only English words (I've never
> tried that, but might be possible by PatternTokenizerFactory or other
> tokenizer/filter chains...,) and apply EdgeNGramFilterFactory to the field.
>
> Hope it helps,
> Tomoko
>
> 2015-10-23 13:04 GMT+09:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
>
> > Hi,
> >
> > Would like to check, is it good to use EdgeNGramFilterFactory for indexes
> > that contains Chinese characters?
> > Will it affect the accuracy of the search for Chinese words?
> >
> > I have rich-text documents that are in both English and Chinese, and
> > currently I have EdgeNGramFilterFactory enabled during indexing, as I
> need
> > it for partial matching for English words. But this means it will also
> > break up each of the Chinese characters into different tokens.
> >
> > I'm using the HMMChineseTokenizerFactory for my tokenizer.
> >
> > Thank you.
> >
> > Regards,
> > Edwin
> >
>

Reply via email to