Eswar - I can answer the Google question.  Actually, you are pointing to it in 
1) :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Eswar K <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, November 28, 2007 2:21:40 AM
Subject: Re: CJK Analyzers for Solr

John,

There were two parts to my question,

1) n-gram vs morphological analyzer - This was based on what I read at
 a few
places which rate morphological analysis higher than n-gram. An example
being (
http://www.basistech.com/knowledge-center/products/N-Gram-vs-morphological-analysis.pdf).
My intention of  asking this was not to question the effectiveness of
 the
existing implementation but was from the process of thought process
 behind
the decision. I was and am curious to know if they are any downsides of
using a morphological analyzer over the CJK analyzer, which prompted me
 to
ask this.

2) Morphological Analyzer used by Google - I dont know which Morph
 analyzer
Google uses,  but I have read at  different places that they do .

- Eswar

On Nov 27, 2007 10:42 PM, John Stewart <[EMAIL PROTECTED]> wrote:

> Eswar,
>
> What type of morphological analysis do you suspect (or know) that
> Google does on east asian text?  I don't think you can treat the
 three
> languages in the same way here.  Japanese has multi-morphemic words,
> but Chinese doesn't really.
>
> jds
>
> On Nov 27, 2007 11:54 AM, Eswar K <[EMAIL PROTECTED]> wrote:
> > Is there any specific reason why the CJK analyzers in Solr were
 chosen
> to be
> > n-gram based instead of it being a morphological analyzer which is
 kind
> of
> > implemented in Google as it considered to be more effective than
 the
> n-gram
> > ones?
> >
> > Regards,
> > Eswar
> >
> >
> >
> >
> > On Nov 27, 2007 7:57 AM, Eswar K <[EMAIL PROTECTED]> wrote:
> >
> > > thanks james...
> > >
> > > How much time does it take to index 18m docs?
> > >
> > > - Eswar
> > >
> > >
> > > On Nov 27, 2007 7:43 AM, James liu <[EMAIL PROTECTED] >
 wrote:
> > >
> > > > i not use HYLANDA analyzer.
> > > >
> > > > i use je-analyzer and indexing at least 18m docs.
> > > >
> > > > i m sorry i only use chinese analyzer.
> > > >
> > > >
> > > > On Nov 27, 2007 10:01 AM, Eswar K <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > What is the performance of these CJK analyzers (one in lucene
 and
> > > > hylanda
> > > > > )?
> > > > > We would potentially be indexing millions of documents.
> > > > >
> > > > > James,
> > > > >
> > > > > We would have a look at hylanda too. What abt japanese and
 korean
> > > > > analyzers,
> > > > > any recommendations?
> > > > >
> > > > > - Eswar
> > > > >
> > > > > On Nov 27, 2007 7:21 AM, James liu <[EMAIL PROTECTED]>
> wrote:
> > > > >
> > > > > > I don't think NGram is good method for Chinese.
> > > > > >
> > > > > > CJKAnalyzer of Lucene is 2-Gram.
> > > > > >
> > > > > > Eswar K:
> > > > > >  if it is chinese analyzer,,i recommend
 hylanda(www.hylanda.com)
> ,,,it
> > > > is
> > > > > > the best chinese analyzer and it not free.
> > > > > >  if u wanna free chinese analyzer, maybe u can try
 je-analyzer.
> it
> > > > have
> > > > > > some problem when using it.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Nov 27, 2007 5:56 AM, Otis Gospodnetic <
> > > > [EMAIL PROTECTED]>
> > > > > > wrote:
> > > > > >
> > > > > > > Eswar,
> > > > > > >
> > > > > > > We've uses the NGram stuff that exists in Lucene's
> > > > contrib/analyzers
> > > > > > > instead of CJK.  Doesn't that allow you to do everything
 that
> the
> > > > > > Chinese
> > > > > > > and CJK analyzers do?  It's been a few months since I've
> looked at
> > > > > > Chinese
> > > > > > > and CJK Analzyers, so I could be off.
> > > > > > >
> > > > > > > Otis
> > > > > > >
> > > > > > > --
> > > > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > > > > >
> > > > > > > ----- Original Message ----
> > > > > > > From: Eswar K <[EMAIL PROTECTED]>
> > > > > > > To: solr-user@lucene.apache.org
> > > > > > > Sent: Monday, November 26, 2007 8:30:52 AM
> > > > > > > Subject: CJK Analyzers for Solr
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Does Solr come with Language analyzers for CJK? If not,
 can
> you
> > > > please
> > > > > > > direct me to some good CJK analyzers?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Eswar
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > regards
> > > > > > jl
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > regards
> > > > jl
> > > >
> > >
> > >
> >
>



Reply via email to