What if Chinese is mixed with English?

I have text that is entered by users and it could be a mix of Chinese, English, 
etc.

What's the best way to handle that?

Thanks.

--- On Mon, 6/28/10, Ahmet Arslan <iori...@yahoo.com> wrote:

> From: Ahmet Arslan <iori...@yahoo.com>
> Subject: Re: Chinese chars are not indexed ?
> To: solr-user@lucene.apache.org
> Date: Monday, June 28, 2010, 3:44 AM
> > oh yes, *...* works. thanks.
> > 
> > I saw tokenizer is defined in schema.xml. There are a
> few
> > places that define the tokenizer. Wondering if it is
> enough
> > to define one for:
> 
> It is better to define a brand new field type specific to
> Chinese. 
> 
> http://wiki.apache.org/solr/LanguageAnalysis?highlight=%28CJKtokenizer%29#Chinese.2C_Japanese.2C_KoreanSomething
> like:
> 
> at index time:
> <tokenizer class="solr.CJKTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> 
> at query time:
> <tokenizer class="solr.CJKTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PositionFilterFactory" />
> 
> 
> 
>       
> 


    

Reply via email to