It doesn't make sense to spell check individual character sized words, but makes a lot of sense for phrases. Due to pervasive use of pinyin IM, it's very easy to write phrases that are totally wrong in semantics and but "sounds" correct. n-gram should work if it doesn't mangle the characters.
On Tue, Apr 12, 2011 at 12:47 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > Hi, > > Does spellchecking in Chinese actually make sense? I once asked a native > Chinese speaker about that and the person told me it didn't really make sense. > Anyhow, with n-grams, I don't think this could technically work even if it > made > sense for Chinese, could it? > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- >> From: alexw <aw...@crossview.com> >> To: solr-user@lucene.apache.org >> Sent: Tue, April 12, 2011 3:07:48 PM >> Subject: Spellchecking in the Chinese Lanugage >> >> Hi, >> >> I have been trying to get spellcheck to work in the Chinese language. So far >> I have not had any luck. Can someone shed some light here as a general guide >> line in terms of what need to happen? >> >> I am using the CJKAnalyzer in the text field type and searching works fine, >> but spelling does not work. Here are the things I have tried: >> >> 1. Put CJKAnalyzer in the "textSpell" field type. >> 2. Set the characterEncoding param to "utf-8" in the spellcheck search >> component. >> 3. Using Luke, I can see the Chinese characters in the "spell" field in the >> main index. >> 4. After building the spelling index, I don't see Chinese characters in the >> "spellchecker" index, only terms in English. >> 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck >> either. >> >> Thanks! >> >> >> -- >> View this message in context: >>http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >