gt; >solr-user@lucene.apache.org Sent: Wednesday,
> >November 28, 2007 5:43:32 PM Subject: Re: CJK
> >Analyzers for Solr With Ultraseek, we switched
> >to a dictionary-based segmenter for Chinese
> >because the N-gram highlighting wasn't
> >acceptable to our Chinese
://sematext.com/ -- Lucene -
Solr - Nutch - Original Message From:
Walter Underwood <[EMAIL PROTECTED]> To:
solr-user@lucene.apache.org Sent: Wednesday,
November 28, 2007 5:43:32 PM Subject: Re: CJK
Analyzers for Solr With Ultraseek, we switched
to a dictionary-based segmenter for C
Underwood <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, November 28, 2007 5:43:32 PM
Subject: Re: CJK Analyzers for Solr
With Ultraseek, we switched to a dictionary-based segmenter for Chinese
because the N-gram highlighting wasn't acceptable to our Chinese
cu
by native speakers of these languages.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message
> From: Walter Underwood <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, November 27, 2007 2:41:38 PM
&
Tuesday, November 27, 2007 12:12:40 PM
Subject: Re: CJK Analyzers for Solr
Eswar,
What type of morphological analysis do you suspect (or know) that
Google does on east asian text? I don't think you can treat the three
languages in the same way here. Japanese has multi-morphemic words,
but C
:40 AM
Subject: Re: CJK Analyzers for Solr
John,
There were two parts to my question,
1) n-gram vs morphological analyzer - This was based on what I read at
a few
places which rate morphological analysis higher than n-gram. An example
being (
http://www.basistech.com/knowledge-center/products/N-G
From: Eswar K <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, November 26, 2007 9:27:15 PM
> Subject: Re: CJK Analyzers for Solr
>
> thanks james...
>
> How much time does it take to index 18m docs?
>
> - Eswar
>
> On Nov 27, 2007 7:43 AM, James l
roblem when using it.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Nov 27, 2007 5:56 AM, Otis Gospodnetic <
> > > > [EMAIL PROTECTED]>
> > > > > > wrote:
> > > > > &g
atext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Eswar K <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 26, 2007 9:27:15 PM
Subject: Re: CJK Analyzers for Solr
thanks james...
How much time does it take to index 18m docs?
- Es
7 8:51:23 PM
Subject: Re: CJK Analyzers for Solr
I don't think NGram is good method for Chinese.
CJKAnalyzer of Lucene is 2-Gram.
Eswar K:
if it is chinese analyzer,,i recommend
hylanda(www.hylanda.com),,,it is
the best chinese analyzer and it not free.
if u wanna free chinese analyzer,
these languages.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Walter Underwood <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, November 27, 2007 2:41:38 PM
Subject: Re: CJK Analyzers for Solr
Dictionaries are surprisingly e
7, 2007 12:12:40 PM
Subject: Re: CJK Analyzers for Solr
Eswar,
What type of morphological analysis do you suspect (or know) that
Google does on east asian text? I don't think you can treat the three
languages in the same way here. Japanese has multi-morphemic words,
but Chinese doesn'
>>> On Nov 27, 2007 5:56 AM, Otis Gospodnetic <
>>> [EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>> Eswar,
>>>>>>
>>>>>> We've uses the NGram stuff that exists in Lucene
On 27-Nov-07, at 8:54 AM, Eswar K wrote:
Is there any specific reason why the CJK analyzers in Solr were
chosen to be
n-gram based instead of it being a morphological analyzer which is
kind of
implemented in Google as it considered to be more effective than
the n-gram
ones?
The CJK analy
Eswar,
> > > > > >
> > > > > > We've uses the NGram stuff that exists in Lucene's
> > > contrib/analyzers
> > > > > > instead of CJK. Doesn't that allow you to do everything that the
> > &
;
> > > > > We've uses the NGram stuff that exists in Lucene's
> > contrib/analyzers
> > > > > instead of CJK. Doesn't that allow you to do everything that the
> > > > Chinese
> > > > > and CJK analyzers do? It's been a
7;ve looked at
> > > Chinese
> > > > and CJK Analzyers, so I could be off.
> > > >
> > > > Otis
> > > >
> > > > --
> > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > >
> &g
could be off.
> > >
> > > Otis
> > >
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > > - Original Message
> > > From: Eswar K <[EMAIL PROTECTED]>
> > > To: solr-user@lucen
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> > - Original Message
> > From: Eswar K <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Monday, November 26, 2007 8:30:52 AM
> > Subject: CJK Analyzers for Solr
> >
> > Hi,
> >
> > Does Solr come with Language analyzers for CJK? If not, can you please
> > direct me to some good CJK analyzers?
> >
> > Regards,
> > Eswar
> >
> >
> >
> >
>
>
> --
> regards
> jl
>
ns C1C2 into 'C1 C2'. I hope someone who speaks Mandarin or
> > Cantonese understands what this should do.
> >
> > Lance
> >
> > -Original Message-
> > From: Eswar K [mailto:[EMAIL PROTECTED]
> > Sent: Monday, November 26, 2007 10:28 AM
>
solr-user@lucene.apache.org
> Sent: Monday, November 26, 2007 8:30:52 AM
> Subject: CJK Analyzers for Solr
>
> Hi,
>
> Does Solr come with Language analyzers for CJK? If not, can you please
> direct me to some good CJK analyzers?
>
> Regards,
> Eswar
>
>
>
>
--
regards
jl
#x27;C1 C2'. I hope someone who speaks Mandarin or
> Cantonese understands what this should do.
>
> Lance
>
> -Original Message-
> From: Eswar K [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 26, 2007 10:28 AM
> To: solr-user@lucene.apache.org
> Subject: Re: CJK Ana
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Eswar K <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 26, 2007 8:30:52 AM
Subject: CJK Analyzers for Solr
Hi,
Does Solr come with Language analyzers for CJK? If n
: I notice this is in the future tense. Is the CJKTokenizer available yet?
CJKTokenizer and CJKAnalyzer are both available in Solr 1.2, but no
TokenizerFactory was provided for CJKTokenizer in 1.2, so it wasn't
possible to use "out of the box" without writing a 3 line java plugin.
that 3 line
1) turns C1C2 into 'C1 C2'. I hope someone who speaks Mandarin or
Cantonese understands what this should do.
Lance
-Original Message-
From: Eswar K [mailto:[EMAIL PROTECTED]
Sent: Monday, November 26, 2007 10:28 AM
To: solr-user@lucene.apache.org
Subject: Re: CJK Analyzers fo
Hoss,
Thanks a lot. Will look into it.
Regards,
Eswar
On Nov 26, 2007 11:55 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : Does Solr come with Language analyzers for CJK? If not, can you please
> : direct me to some good CJK analyzers?
>
> Lucene has a CJKTokenizer and CJKAnalyzer in the
: Does Solr come with Language analyzers for CJK? If not, can you please
: direct me to some good CJK analyzers?
Lucene has a CJKTokenizer and CJKAnalyzer in the contrib/analyzers jar.
they can be used in Solr. both have been included in Solr for a while
now, so you can specify CJKAnalyzer in
Hi,
Does Solr come with Language analyzers for CJK? If not, can you please
direct me to some good CJK analyzers?
Regards,
Eswar
28 matches
Mail list logo