Eswar - I can answer the Google question. Actually, you are pointing to it in 1) :)
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Eswar K <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, November 28, 2007 2:21:40 AM Subject: Re: CJK Analyzers for Solr John, There were two parts to my question, 1) n-gram vs morphological analyzer - This was based on what I read at a few places which rate morphological analysis higher than n-gram. An example being ( http://www.basistech.com/knowledge-center/products/N-Gram-vs-morphological-analysis.pdf). My intention of asking this was not to question the effectiveness of the existing implementation but was from the process of thought process behind the decision. I was and am curious to know if they are any downsides of using a morphological analyzer over the CJK analyzer, which prompted me to ask this. 2) Morphological Analyzer used by Google - I dont know which Morph analyzer Google uses, but I have read at different places that they do . - Eswar On Nov 27, 2007 10:42 PM, John Stewart <[EMAIL PROTECTED]> wrote: > Eswar, > > What type of morphological analysis do you suspect (or know) that > Google does on east asian text? I don't think you can treat the three > languages in the same way here. Japanese has multi-morphemic words, > but Chinese doesn't really. > > jds > > On Nov 27, 2007 11:54 AM, Eswar K <[EMAIL PROTECTED]> wrote: > > Is there any specific reason why the CJK analyzers in Solr were chosen > to be > > n-gram based instead of it being a morphological analyzer which is kind > of > > implemented in Google as it considered to be more effective than the > n-gram > > ones? > > > > Regards, > > Eswar > > > > > > > > > > On Nov 27, 2007 7:57 AM, Eswar K <[EMAIL PROTECTED]> wrote: > > > > > thanks james... > > > > > > How much time does it take to index 18m docs? > > > > > > - Eswar > > > > > > > > > On Nov 27, 2007 7:43 AM, James liu <[EMAIL PROTECTED] > wrote: > > > > > > > i not use HYLANDA analyzer. > > > > > > > > i use je-analyzer and indexing at least 18m docs. > > > > > > > > i m sorry i only use chinese analyzer. > > > > > > > > > > > > On Nov 27, 2007 10:01 AM, Eswar K <[EMAIL PROTECTED]> wrote: > > > > > > > > > What is the performance of these CJK analyzers (one in lucene and > > > > hylanda > > > > > )? > > > > > We would potentially be indexing millions of documents. > > > > > > > > > > James, > > > > > > > > > > We would have a look at hylanda too. What abt japanese and korean > > > > > analyzers, > > > > > any recommendations? > > > > > > > > > > - Eswar > > > > > > > > > > On Nov 27, 2007 7:21 AM, James liu <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > I don't think NGram is good method for Chinese. > > > > > > > > > > > > CJKAnalyzer of Lucene is 2-Gram. > > > > > > > > > > > > Eswar K: > > > > > > if it is chinese analyzer,,i recommend hylanda(www.hylanda.com) > ,,,it > > > > is > > > > > > the best chinese analyzer and it not free. > > > > > > if u wanna free chinese analyzer, maybe u can try je-analyzer. > it > > > > have > > > > > > some problem when using it. > > > > > > > > > > > > > > > > > > > > > > > > On Nov 27, 2007 5:56 AM, Otis Gospodnetic < > > > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > > > > > > > Eswar, > > > > > > > > > > > > > > We've uses the NGram stuff that exists in Lucene's > > > > contrib/analyzers > > > > > > > instead of CJK. Doesn't that allow you to do everything that > the > > > > > > Chinese > > > > > > > and CJK analyzers do? It's been a few months since I've > looked at > > > > > > Chinese > > > > > > > and CJK Analzyers, so I could be off. > > > > > > > > > > > > > > Otis > > > > > > > > > > > > > > -- > > > > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > > > > > > > ----- Original Message ---- > > > > > > > From: Eswar K <[EMAIL PROTECTED]> > > > > > > > To: solr-user@lucene.apache.org > > > > > > > Sent: Monday, November 26, 2007 8:30:52 AM > > > > > > > Subject: CJK Analyzers for Solr > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > Does Solr come with Language analyzers for CJK? If not, can > you > > > > please > > > > > > > direct me to some good CJK analyzers? > > > > > > > > > > > > > > Regards, > > > > > > > Eswar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > regards > > > > > > jl > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > regards > > > > jl > > > > > > > > > > > > >