lance,

The following is a instance schema fieldtype using solr1.2 and CJK package.
And it works. As you said, CJK does parse cjk string in a bi-gram way, just
like turning 'C1C2C3C4' into 'C1C2 C2C3 C3C4'.

More to the point, it is worthwhile to mention that the index expand beyond
tolerance to use cjk package, and it will take a long time to index
document. For most enterprise applications, I think, it need a more
effective string parser.


<fieldtype name="text_cjk" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer"/>
</fieldtype>



On 11/27/07, Norskog, Lance <[EMAIL PROTECTED]> wrote:
>
> I notice this is in the future tense. Is the CJKTokenizer available yet?
> From what I can see, the CJK code should be a Filter instead anyway.
> Also, the ChineseFilter and CJKTokenizer do two different things.
>
> CJKTokenizer turns C1C2C3C4 into 'C1C2 C2C3 C3C4'. ChineseFilter (from
> 2001) turns C1C2 into 'C1 C2'. I hope someone who speaks Mandarin or
> Cantonese understands what this should do.
>
> Lance
>
> -----Original Message-----
> From: Eswar K [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 26, 2007 10:28 AM
> To: solr-user@lucene.apache.org
> Subject: Re: CJK Analyzers for Solr
>
> Hoss,
>
> Thanks a lot. Will look into it.
>
> Regards,
> Eswar
>
> On Nov 26, 2007 11:55 PM, Chris Hostetter <[EMAIL PROTECTED]>
> wrote:
>
> >
> > : Does Solr come with Language analyzers for CJK? If not, can you
> > please
> > : direct me to some good CJK analyzers?
> >
> > Lucene has a CJKTokenizer and CJKAnalyzer in the contrib/analyzers
> jar.
> > they can be used in Solr.  both have been included in Solr for a while
>
> > now, so you can specify CJKAnalyzer in your schema with Solr 1.2, but
> > starting with Solr 1.3 a Factory for the Tokenizer will also be
> > included so it can be used in a more complex analysis chain defined in
> the schema.
> >
> >
> >
> > -Hoss
> >
> >
>

Reply via email to