CharFilter can normalize (convert) traditional chinese to simplified chinese or vice versa, if you define mapping.txt. Here is the sample of Chinese character normalization:

https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG

See SOLR-822 for the detail:

https://issues.apache.org/jira/browse/SOLR-822

Koji


revathy arun wrote:
Hi,

When I index chinese content using chinese tokenizer and analyzer in solr
1.3 ,some of the chinese text files are getting indexed but others are not.

Since chinese has got many different language subtypes as in standard
chinese,simplified chinese etc which of these does the chinese tokenizer
support and is there any method to find the type of  chiense language  from
the file?

Rgds


Reply via email to