Re: CJK Analyzers for Solr

James liu Mon, 26 Nov 2007 17:54:18 -0800

if ur analyzer is standard, u can try use tokenize.(u can find the answer
from analyzer source code and schema.xml)



On Nov 27, 2007 9:39 AM, zx zhang <[EMAIL PROTECTED]> wrote:

> lance,
>
> The following is a instance schema fieldtype using solr1.2 and CJK
> package.
> And it works. As you said, CJK does parse cjk string in a bi-gram way,
> just
> like turning 'C1C2C3C4' into 'C1C2 C2C3 C3C4'.
>
> More to the point, it is worthwhile to mention that the index expand
> beyond
> tolerance to use cjk package, and it will take a long time to index
> document. For most enterprise applications, I think, it need a more
> effective string parser.
>
>
> <fieldtype name="text_cjk" class="solr.TextField">
>      <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer"/>
> </fieldtype>
>
>
>
> On 11/27/07, Norskog, Lance <[EMAIL PROTECTED]> wrote:
> >
> > I notice this is in the future tense. Is the CJKTokenizer available yet?
> > From what I can see, the CJK code should be a Filter instead anyway.
> > Also, the ChineseFilter and CJKTokenizer do two different things.
> >
> > CJKTokenizer turns C1C2C3C4 into 'C1C2 C2C3 C3C4'. ChineseFilter (from
> > 2001) turns C1C2 into 'C1 C2'. I hope someone who speaks Mandarin or
> > Cantonese understands what this should do.
> >
> > Lance
> >
> > -----Original Message-----
> > From: Eswar K [mailto:[EMAIL PROTECTED]
> > Sent: Monday, November 26, 2007 10:28 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: CJK Analyzers for Solr
> >
> > Hoss,
> >
> > Thanks a lot. Will look into it.
> >
> > Regards,
> > Eswar
> >
> > On Nov 26, 2007 11:55 PM, Chris Hostetter <[EMAIL PROTECTED]>
> > wrote:
> >
> > >
> > > : Does Solr come with Language analyzers for CJK? If not, can you
> > > please
> > > : direct me to some good CJK analyzers?
> > >
> > > Lucene has a CJKTokenizer and CJKAnalyzer in the contrib/analyzers
> > jar.
> > > they can be used in Solr.  both have been included in Solr for a while
> >
> > > now, so you can specify CJKAnalyzer in your schema with Solr 1.2, but
> > > starting with Solr 1.3 a Factory for the Tokenizer will also be
> > > included so it can be used in a more complex analysis chain defined in
> > the schema.
> > >
> > >
> > >
> > > -Hoss
> > >
> > >
> >
>



-- 
regards
jl

Re: CJK Analyzers for Solr

Reply via email to