Re: Indexing documents in Chinese

2015-06-10 Thread Zheng Lin Edwin Yeo
I've tried to use solr.HMMChineseTokenizerFactory with the following configurations: It is able to be indexed, but when I tried to search for the words, it matches many more other words and not just the words that I search. Why is this so? For example, the query ht

Re: Indexing documents in Chinese

2015-06-09 Thread Alexandre Rafalovitch
You may find the series of article on CJK analysis/search helpful: http://discovery-grindstone.blogspot.com.au/ It's a little out of date, but should be a very solid intro. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 10 J

Indexing documents in Chinese

2015-06-09 Thread Zheng Lin Edwin Yeo
Hi, I'm trying to index rich-text documents that are in chinese. Currently, there's no problem with indexing, but there's problem with the searching. Does anyone knows what is the best Tokenizer and Filter Factory to use? I'm now using the solr.StandardTokenizerFactory which I heard that it's not