Hi,

While some of the characters in simplified and traditional Chinese do differ, 
the Chinese tokenizer doesn't care - it simply creates ngram tokens.

Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 




________________________________
From: revathy arun <revas...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 4:30:47 PM
Subject: indexing Chienese langage

Hi,

When I index chinese content using chinese tokenizer and analyzer in solr
1.3 ,some of the chinese text files are getting indexed but others are not.

Since chinese has got many different language subtypes as in standard
chinese,simplified chinese etc which of these does the chinese tokenizer
support and is there any method to find the type of  chiense language  from
the file?

Rgds

Reply via email to