What if Chinese is mixed with English? I have text that is entered by users and it could be a mix of Chinese, English, etc.
What's the best way to handle that? Thanks. --- On Mon, 6/28/10, Ahmet Arslan <iori...@yahoo.com> wrote: > From: Ahmet Arslan <iori...@yahoo.com> > Subject: Re: Chinese chars are not indexed ? > To: solr-user@lucene.apache.org > Date: Monday, June 28, 2010, 3:44 AM > > oh yes, *...* works. thanks. > > > > I saw tokenizer is defined in schema.xml. There are a > few > > places that define the tokenizer. Wondering if it is > enough > > to define one for: > > It is better to define a brand new field type specific to > Chinese. > > http://wiki.apache.org/solr/LanguageAnalysis?highlight=%28CJKtokenizer%29#Chinese.2C_Japanese.2C_KoreanSomething > like: > > at index time: > <tokenizer class="solr.CJKTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > > at query time: > <tokenizer class="solr.CJKTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.PositionFilterFactory" /> > > > > >