Hi, While some of the characters in simplified and traditional Chinese do differ, the Chinese tokenizer doesn't care - it simply creates ngram tokens.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ________________________________ From: revathy arun <revas...@gmail.com> To: solr-user@lucene.apache.org Sent: Monday, February 16, 2009 4:30:47 PM Subject: indexing Chienese langage Hi, When I index chinese content using chinese tokenizer and analyzer in solr 1.3 ,some of the chinese text files are getting indexed but others are not. Since chinese has got many different language subtypes as in standard chinese,simplified chinese etc which of these does the chinese tokenizer support and is there any method to find the type of chiense language from the file? Rgds