I did a little research into this for a client a while. The character mapping 
is not one to one which complicates things (TC and SC have evolved 
independently) and if you want to do a perfect job you will need a dictionary. 
However there are tables out there (I can dig one up for you) that allow 
conversion from one to the other. So you would pick either TC or SC as your 
canonical Chinese, and just convert all the documents and searches to it.

I will stress that this is very much a brute force approach, the mapping is not 
perfect and the two character sets have evolved (much like UK and US English, I 
was brought up in the UK and live in the US).

Hope this helps.

Cheers

François

On Mar 7, 2011, at 5:02 PM, Andy wrote:

> I have documents that contain both simplified and traditional Chinese 
> characters. Is there any way to search across them? For example, if someone 
> searches for 类 (simplified Chinese), I'd like to be able to recognize that 
> the equivalent character is 類 in traditional Chinese and search for 类 or 類 in 
> the documents. 
> 
> Is that something that Solr, or any related software, can do? Is there a 
> standard approach in dealing with this problem?
> 
> Thanks.
> 
> 
> 

Reply via email to