I would like to check, is it possible to use JiebaTokenizerFactory to index Multilingual documents in Solr?
I found that JiebaTokenizerFactory works better for Chinese characters as compared to HMMChineseTokenizerFactory. However, for English characters, the JiebaTokenizerFactory is cutting the words at the wrong place. For example, it will cut the word "water" as follows: *w|at|er* It means that Solr will search for 3 separate words of "w", "at" and "er" instead of the entire word "water". Is there anyway to solve this problem, besides using a separate field for English and Chinese characters? Regards, Edwin