Re: CJK Analyzers for Solr

2007-12-03 Thread James liu
gt; >solr-user@lucene.apache.org Sent: Wednesday, > >November 28, 2007 5:43:32 PM Subject: Re: CJK > >Analyzers for Solr With Ultraseek, we switched > >to a dictionary-based segmenter for Chinese > >because the N-gram highlighting wasn't > >acceptable to our Chinese

Re: CJK Analyzers for Solr

2007-12-02 Thread Ken Krugler
://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Walter Underwood <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, November 28, 2007 5:43:32 PM Subject: Re: CJK Analyzers for Solr With Ultraseek, we switched to a dictionary-based segmenter for C

Re: CJK Analyzers for Solr

2007-12-02 Thread Otis Gospodnetic
Underwood <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, November 28, 2007 5:43:32 PM Subject: Re: CJK Analyzers for Solr With Ultraseek, we switched to a dictionary-based segmenter for Chinese because the N-gram highlighting wasn't acceptable to our Chinese cu

Re: CJK Analyzers for Solr

2007-11-28 Thread Walter Underwood
by native speakers of these languages. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: Walter Underwood <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, November 27, 2007 2:41:38 PM &

Re: CJK Analyzers for Solr

2007-11-27 Thread Luke Lu
Tuesday, November 27, 2007 12:12:40 PM Subject: Re: CJK Analyzers for Solr Eswar, What type of morphological analysis do you suspect (or know) that Google does on east asian text? I don't think you can treat the three languages in the same way here. Japanese has multi-morphemic words, but C

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
:40 AM Subject: Re: CJK Analyzers for Solr John, There were two parts to my question, 1) n-gram vs morphological analyzer - This was based on what I read at a few places which rate morphological analysis higher than n-gram. An example being ( http://www.basistech.com/knowledge-center/products/N-G

Re: CJK Analyzers for Solr

2007-11-27 Thread Eswar K
From: Eswar K <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, November 26, 2007 9:27:15 PM > Subject: Re: CJK Analyzers for Solr > > thanks james... > > How much time does it take to index 18m docs? > > - Eswar > > On Nov 27, 2007 7:43 AM, James l

Re: CJK Analyzers for Solr

2007-11-27 Thread Eswar K
roblem when using it. > > > > > > > > > > > > > > > > > > > > > > > > On Nov 27, 2007 5:56 AM, Otis Gospodnetic < > > > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > &g

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
atext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Eswar K <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, November 26, 2007 9:27:15 PM Subject: Re: CJK Analyzers for Solr thanks james... How much time does it take to index 18m docs? - Es

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
7 8:51:23 PM Subject: Re: CJK Analyzers for Solr I don't think NGram is good method for Chinese. CJKAnalyzer of Lucene is 2-Gram. Eswar K: if it is chinese analyzer,,i recommend hylanda(www.hylanda.com),,,it is the best chinese analyzer and it not free. if u wanna free chinese analyzer,

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
these languages. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Walter Underwood <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, November 27, 2007 2:41:38 PM Subject: Re: CJK Analyzers for Solr Dictionaries are surprisingly e

Re: CJK Analyzers for Solr

2007-11-27 Thread Otis Gospodnetic
7, 2007 12:12:40 PM Subject: Re: CJK Analyzers for Solr Eswar, What type of morphological analysis do you suspect (or know) that Google does on east asian text? I don't think you can treat the three languages in the same way here. Japanese has multi-morphemic words, but Chinese doesn'

Re: CJK Analyzers for Solr

2007-11-27 Thread Walter Underwood
>>> On Nov 27, 2007 5:56 AM, Otis Gospodnetic < >>> [EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>>> Eswar, >>>>>> >>>>>> We've uses the NGram stuff that exists in Lucene&#

Re: CJK Analyzers for Solr

2007-11-27 Thread Mike Klaas
On 27-Nov-07, at 8:54 AM, Eswar K wrote: Is there any specific reason why the CJK analyzers in Solr were chosen to be n-gram based instead of it being a morphological analyzer which is kind of implemented in Google as it considered to be more effective than the n-gram ones? The CJK analy

Re: CJK Analyzers for Solr

2007-11-27 Thread John Stewart
Eswar, > > > > > > > > > > > > We've uses the NGram stuff that exists in Lucene's > > > contrib/analyzers > > > > > > instead of CJK. Doesn't that allow you to do everything that the > > &

Re: CJK Analyzers for Solr

2007-11-27 Thread Eswar K
; > > > > > We've uses the NGram stuff that exists in Lucene's > > contrib/analyzers > > > > > instead of CJK. Doesn't that allow you to do everything that the > > > > Chinese > > > > > and CJK analyzers do? It's been a

Re: CJK Analyzers for Solr

2007-11-26 Thread Eswar K
7;ve looked at > > > Chinese > > > > and CJK Analzyers, so I could be off. > > > > > > > > Otis > > > > > > > > -- > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > &g

Re: CJK Analyzers for Solr

2007-11-26 Thread James liu
could be off. > > > > > > Otis > > > > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > - Original Message > > > From: Eswar K <[EMAIL PROTECTED]> > > > To: solr-user@lucen

Re: CJK Analyzers for Solr

2007-11-26 Thread Eswar K
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Eswar K <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Monday, November 26, 2007 8:30:52 AM > > Subject: CJK Analyzers for Solr > > > > Hi, > > > > Does Solr come with Language analyzers for CJK? If not, can you please > > direct me to some good CJK analyzers? > > > > Regards, > > Eswar > > > > > > > > > > > -- > regards > jl >

Re: CJK Analyzers for Solr

2007-11-26 Thread James liu
ns C1C2 into 'C1 C2'. I hope someone who speaks Mandarin or > > Cantonese understands what this should do. > > > > Lance > > > > -Original Message- > > From: Eswar K [mailto:[EMAIL PROTECTED] > > Sent: Monday, November 26, 2007 10:28 AM >

Re: CJK Analyzers for Solr

2007-11-26 Thread James liu
solr-user@lucene.apache.org > Sent: Monday, November 26, 2007 8:30:52 AM > Subject: CJK Analyzers for Solr > > Hi, > > Does Solr come with Language analyzers for CJK? If not, can you please > direct me to some good CJK analyzers? > > Regards, > Eswar > > > > -- regards jl

Re: CJK Analyzers for Solr

2007-11-26 Thread zx zhang
#x27;C1 C2'. I hope someone who speaks Mandarin or > Cantonese understands what this should do. > > Lance > > -Original Message- > From: Eswar K [mailto:[EMAIL PROTECTED] > Sent: Monday, November 26, 2007 10:28 AM > To: solr-user@lucene.apache.org > Subject: Re: CJK Ana

Re: CJK Analyzers for Solr

2007-11-26 Thread Otis Gospodnetic
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Eswar K <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, November 26, 2007 8:30:52 AM Subject: CJK Analyzers for Solr Hi, Does Solr come with Language analyzers for CJK? If n

RE: CJK Analyzers for Solr

2007-11-26 Thread Chris Hostetter
: I notice this is in the future tense. Is the CJKTokenizer available yet? CJKTokenizer and CJKAnalyzer are both available in Solr 1.2, but no TokenizerFactory was provided for CJKTokenizer in 1.2, so it wasn't possible to use "out of the box" without writing a 3 line java plugin. that 3 line

RE: CJK Analyzers for Solr

2007-11-26 Thread Norskog, Lance
1) turns C1C2 into 'C1 C2'. I hope someone who speaks Mandarin or Cantonese understands what this should do. Lance -Original Message- From: Eswar K [mailto:[EMAIL PROTECTED] Sent: Monday, November 26, 2007 10:28 AM To: solr-user@lucene.apache.org Subject: Re: CJK Analyzers fo

Re: CJK Analyzers for Solr

2007-11-26 Thread Eswar K
Hoss, Thanks a lot. Will look into it. Regards, Eswar On Nov 26, 2007 11:55 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Does Solr come with Language analyzers for CJK? If not, can you please > : direct me to some good CJK analyzers? > > Lucene has a CJKTokenizer and CJKAnalyzer in the

Re: CJK Analyzers for Solr

2007-11-26 Thread Chris Hostetter
: Does Solr come with Language analyzers for CJK? If not, can you please : direct me to some good CJK analyzers? Lucene has a CJKTokenizer and CJKAnalyzer in the contrib/analyzers jar. they can be used in Solr. both have been included in Solr for a while now, so you can specify CJKAnalyzer in

CJK Analyzers for Solr

2007-11-26 Thread Eswar K
Hi, Does Solr come with Language analyzers for CJK? If not, can you please direct me to some good CJK analyzers? Regards, Eswar