I would like to check, is it possible to use JiebaTokenizerFactory to index
Multilingual documents in Solr?

I found that JiebaTokenizerFactory works better for Chinese characters as
compared to HMMChineseTokenizerFactory.

However, for English characters, the JiebaTokenizerFactory is cutting the
words at the wrong place. For example, it will cut the word "water" as
follows:
*w|at|er*

It means that Solr will search for 3 separate words of "w", "at" and "er"
instead of the entire word "water".

Is there anyway to solve this problem, besides using a separate field for
English and Chinese characters?

Regards,
Edwin

Reply via email to