Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?

Ken Krugler Thu, 20 Oct 2011 03:53:01 -0700

> Wow, interesting question.  Can soundex even be applied to a language like 
> Chinese, which is tonal and doesn't have individual letters, but whole 
> characters?  I'm no expert, but intuitively speaking it sounds hard or maybe 
> even impossible...


The only two cases I can think of are:

 - Cases where you have two (or more) characters that are variant forms. 
Unicode tried to unify all of these, but some still exist. And in GB 18030 
there are tons.

 - If you wanted to support phonetic (pinyin or zhuyin) search, then you might 
want to collapse syllables that are commonly confused. But then of course you'd 
have to be storing the phonetic forms for all of the words.

-- Ken


>> From: Floyd Wu <floyd...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, October 20, 2011 5:43 AM
>> Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
>> 
>> Hi  there,
>> 
>> There are many English soundex implementation can be referenced, but I
>> wonder how to do Chinese soundex(sounds like) filter (maybe).
>> 
>> any idea?
>> 
>> Floyd
>> 
>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr

Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?

Reply via email to