Re: indexing Chienese langage

Otis Gospodnetic Mon, 16 Feb 2009 19:40:49 -0800

Hi,

While some of the characters in simplified and traditional Chinese do differ, 
the Chinese tokenizer doesn't care - it simply creates ngram tokens.

Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 

________________________________
From: revathy arun <revas...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 4:30:47 PM
Subject: indexing Chienese langage

Hi,

When I index chinese content using chinese tokenizer and analyzer in solr
1.3 ,some of the chinese text files are getting indexed but others are not.

Since chinese has got many different language subtypes as in standard
chinese,simplified chinese etc which of these does the chinese tokenizer
support and is there any method to find the type of  chiense language  from
the file?

Rgds

Re: indexing Chienese langage

Reply via email to