Re: spellcheckhandler

2008-01-04 Thread John Stewart
The way we do this is with the Solr 1.2 (the current release), inspired by a discussion on the ML, is to build a spellcheck dictionary with the relevant collocations such as the one in your example, based on a custom field that is effectively not tokenized. We actually create dummy documents for th

Re: Performance stats for indeces with over 10MM documents

2008-01-02 Thread John Stewart
Alex, Not to be a pain, but the response I had when looking at the query was, why not do this in a SQL database, which is designed precisely to process this sort of request at speed? I've noticed that people sometimes try to get Solr to act as a generalized information store -- I'm not sure that'

Re: Performance stats for indeces with over 10MM documents

2008-01-02 Thread John Stewart
Alex, That's too slow. Can you provide more details about your schema, queries etc? jds On Jan 2, 2008 7:28 PM, Alex Benjamen <[EMAIL PROTECTED]> wrote: > Hi, > > I'm very interested in sharing performance stats with those who have indeces > that > contain more than 10MM documents. It seems th

Re: Search for related content

2007-12-29 Thread John Stewart
Isn't MoreLikeThis only available in the 1.3-dev builds? jds On Dec 29, 2007 10:07 AM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Instead of: > > http://localhost:8983/solr/select?q=nid:7280&mlt=true&mlt.fl=title,body&mlt.mindf=1&mlt.mintf=1&fl=nid,title,score > > Try: > > http://localhost:898

Re: Local Disk and SAN

2007-11-30 Thread John Stewart
Jae, We recently benchmarked local, SAN and NFS using a real-world Lucene-based benchmark. For searching we found that SAN was marginally slower than local disks, about 1% slower, while for adding documents the SAN was 3x faster, doubtless because of the high parallelism in the writes. I would s

Re: LowerCaseFilterFactory and spellchecker

2007-11-28 Thread John Stewart
Rob, Let's say it worked as you want it to in the first place. If the query is for Thurne, wouldn't you get thorne (lower-case 't') as the suggestion? This may look weird for proper names. jds

Re: CJK Analyzers for Solr

2007-11-27 Thread John Stewart
Eswar, What type of morphological analysis do you suspect (or know) that Google does on east asian text? I don't think you can treat the three languages in the same way here. Japanese has multi-morphemic words, but Chinese doesn't really. jds On Nov 27, 2007 11:54 AM, Eswar K <[EMAIL PROTECTED