Re: Retrieving Tokens

2007-12-20 Thread Eswar K
Yonik/Erick, We are building a custome Search which is to be done in 2 parts executed at different points of time. As a result of it, the first step we want tokenize the information and store it, which we want to retrieve a later point of time for further processing and then store it back into the

Leading WildCard in Query

2007-12-12 Thread Eswar K
Hi All, I understand that a leading Wild card search is not allowed as it is a very costly operation. There is an issues logged for it . ( http://issues.apache.org/jira/browse/SOLR-218). Is there any other way of enabling leading wildcards apart from doing it in code by calling * QueryParser.setAl

Re: LSA Implementation

2007-11-28 Thread Eswar K
org/ is a separate world language database > project. I found it at the bottom of the WordNet wikipedia page. Thanks > for starting me on the search! > > Lance > > -Original Message- > From: Eswar K [mailto:[EMAIL PROTECTED] > Sent: Monday, November 26, 2007 6:50 PM &

Re: CJK Analyzers for Solr

2007-11-27 Thread Eswar K
Analyzer is slower, for example. I > recently indexed cca 20MM large docs on a 8-core, 8 GB RAM box in 10 hours - > 550 docs/second. No CJK, just English. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message >

Re: CJK Analyzers for Solr

2007-11-27 Thread Eswar K
(or know) that > Google does on east asian text? I don't think you can treat the three > languages in the same way here. Japanese has multi-morphemic words, > but Chinese doesn't really. > > jds > > On Nov 27, 2007 11:54 AM, Eswar K <[EMAIL PROTECTED]> wrote: &g

Re: CJK Analyzers for Solr

2007-11-27 Thread Eswar K
Is there any specific reason why the CJK analyzers in Solr were chosen to be n-gram based instead of it being a morphological analyzer which is kind of implemented in Google as it considered to be more effective than the n-gram ones? Regards, Eswar On Nov 27, 2007 7:57 AM, Eswar K <[EM

Re: LSA Implementation

2007-11-26 Thread Eswar K
ng your own analyses. > > http://en.wikipedia.org/wiki/WordNet > http://wordnet.princeton.edu/ > > Lance > > -Original Message- > From: Eswar K [mailto:[EMAIL PROTECTED] > Sent: Monday, November 26, 2007 6:34 PM > To: solr-user@lucene.apache.org > Subject: Re:

Re: LSA Implementation

2007-11-26 Thread Eswar K
yword, Where a plain keyword search will fail if there is no exact match, this algo will often return relevant documents that don't contain the keyword at all. - Eswar On Nov 27, 2007 7:51 AM, Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > On Nov 26, 2007, at 6:06 PM, E

Re: CJK Analyzers for Solr

2007-11-26 Thread Eswar K
> On Nov 27, 2007 10:01 AM, Eswar K <[EMAIL PROTECTED]> wrote: > > > What is the performance of these CJK analyzers (one in lucene and > hylanda > > )? > > We would potentially be indexing millions of documents. > > > > James, > > > > We w

Re: LSA Implementation

2007-11-26 Thread Eswar K
esults using LSI? > > I suppose if someone said they had a patch for Lucene/Solr that > implemented it, we could ask on legal-discuss for advice. > > -Grant > > On Nov 26, 2007, at 1:13 PM, Eswar K wrote: > > > I was just searching for info on LSA and came across Se

Re: CJK Analyzers for Solr

2007-11-26 Thread Eswar K
AIL PROTECTED]> wrote: > I don't think NGram is good method for Chinese. > > CJKAnalyzer of Lucene is 2-Gram. > > Eswar K: > if it is chinese analyzer,,i recommend hylanda(www.hylanda.com),,,it is > the best chinese analyzer and it not free. > if u wanna free chinese an

Re: CJK Analyzers for Solr

2007-11-26 Thread Eswar K
Hoss, Thanks a lot. Will look into it. Regards, Eswar On Nov 26, 2007 11:55 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Does Solr come with Language analyzers for CJK? If not, can you please > : direct me to some good CJK analyzers? > > Lucene has a CJKTokenizer and CJKAnalyzer in the

Re: LSA Implementation

2007-11-26 Thread Eswar K
_indexing) is > > patented, so it is not likely to happen unless the authors donate the > > patent to the ASF. > > > > -Grant > > > > > > > > On Nov 26, 2007, at 8:23 AM, Eswar K wrote: > > > > > All, > > > > > &

CJK Analyzers for Solr

2007-11-26 Thread Eswar K
Hi, Does Solr come with Language analyzers for CJK? If not, can you please direct me to some good CJK analyzers? Regards, Eswar

LSA Implementation

2007-11-26 Thread Eswar K
All, Is there any plan to implement Latent Semantic Analysis as part of Solr anytime in the near future? Regards, Eswar

Re: Any tips for indexing large amounts of data?

2007-11-20 Thread Eswar K
t 20MM docs. Redo the search, and it's 1 ms (cached). > This is without any load nor serious benchmarking, clearly. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: Eswar K <[EMAIL PROTECTED]> > To

Re: Any tips for indexing large amounts of data?

2007-11-20 Thread Eswar K
Hi otis, I understand that is slightly off track question, but I am just curious to know the performance of Search on a 20 GB index file. What has been your observation? Regards, Eswar On Nov 21, 2007 12:33 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Mike is right about the occasional slo

Solr on Windows / Linux

2007-11-19 Thread Eswar K
All, Is there any difference in the way any of the Solr's features work on Windows/Linux. Ideally it should not as its a java implementation. I was looking at CollectionsDistribution and its documentation ( http://wiki.apache.org/solr/CollectionDistribution). It appeared that it uses rsync which i

Re: Performance of Solr on different Platforms

2007-11-19 Thread Eswar K
have five servers to do that. We have a separate server > for indexing and use the Solr distribution scripts. > > We have a relatively small index, about 250K docs. > > wunder > > > On 11/19/07 8:48 PM, "Eswar K" <[EMAIL PROTECTED]> wrote: > > >

Re: Performance of Solr on different Platforms

2007-11-19 Thread Eswar K
> Do you mean that you're expecting about 1000 QPS over an index with up > to 20 million documents? > > --Matthew > > On Nov 19, 2007, at 6:00 AM, Eswar K wrote: > > > All, > > > > Can you give some information on this or atleast let me know where

Re: Performance of Solr on different Platforms

2007-11-19 Thread Eswar K
All, Can you give some information on this or atleast let me know where I can find this information if its already listed out anywhere. Regards, Eswar On Nov 18, 2007 9:45 PM, Eswar K <[EMAIL PROTECTED]> wrote: > Hi, > > I understand that Solr can be used on different Linux fl

Re: Finding all possible synonyms for a word

2007-11-18 Thread Eswar K
Kishore, Solr has a SynonymFilterFactory which might be off use to you ( http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46) Regards, Eswar On Nov 18, 2007 10:39 PM, Kishore AVK. Veleti <[EMAIL PROTECTED]> wrote: > Hi All, > > I am new to

Re: Near Duplicate Documents

2007-11-18 Thread Eswar K
Is there any idea implementing that feature in the up coming releases? Regards, Eswar On Nov 18, 2007 9:35 PM, Stuart Sierra <[EMAIL PROTECTED]> wrote: > On Nov 18, 2007 10:50 AM, Eswar K <[EMAIL PROTECTED]> wrote: > > We have a scenario, where we want to find out document

Performance of Solr on different Platforms

2007-11-18 Thread Eswar K
Hi, I understand that Solr can be used on different Linux flavors. Is there any preferred flavor (Like Red Hat, Ubuntu, etc)? Also what is the kind of configuration of hardware (Processors, RAM, etc) be best suited for the install? We expect to load it with millions of documents (varying from 2 -

Re: Near Duplicate Documents

2007-11-18 Thread Eswar K
We have a scenario, where we want to find out documents which are similar in content. To elaborate a little more on what we mean here, lets take an example. The example of this email chain in which we are interacting on, can be best used for illustrating the concept of near dupes (We are not getti