> Wow, interesting question. Can soundex even be applied to a language like > Chinese, which is tonal and doesn't have individual letters, but whole > characters? I'm no expert, but intuitively speaking it sounds hard or maybe > even impossible...
The only two cases I can think of are: - Cases where you have two (or more) characters that are variant forms. Unicode tried to unify all of these, but some still exist. And in GB 18030 there are tons. - If you wanted to support phonetic (pinyin or zhuyin) search, then you might want to collapse syllables that are commonly confused. But then of course you'd have to be storing the phonetic forms for all of the words. -- Ken >> From: Floyd Wu <floyd...@gmail.com> >> To: solr-user@lucene.apache.org >> Sent: Thursday, October 20, 2011 5:43 AM >> Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR? >> >> Hi there, >> >> There are many English soundex implementation can be referenced, but I >> wonder how to do Chinese soundex(sounds like) filter (maybe). >> >> any idea? >> >> Floyd >> >> >> -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr