The logistics of handling giant index files hit us before search performance. We switched to a set of indexes running inside one server (tomcat) instance with the Multicore+Distributed Search tools, with a frozen old index and a new index actively taking updates. The smaller new index takes much less time to recover after a commit.
The DS code does not handle cases where the new and old index have different versions of the same document. We wrote a custom distributed search that favored the "new" index over the "old". Lance -----Original Message----- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Monday, November 03, 2008 4:25 PM To: solr-user@lucene.apache.org Subject: Re: SOLR Performance If you never execute any queries, a gig should be more than enough. Of course, I've never played around with a .8 billion doc corpus on one machine. -Mike On 3-Nov-08, at 2:16 PM, Alok Dhir wrote: > in terms of RAM -- how to size that on the indexer? > > --- > Alok K. Dhir > Symplicity Corporation > www.symplicity.com > (703) 351-0200 x 8080 > [EMAIL PROTECTED] > > On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote: > >> The indexing box can be much smaller, especially in terms of CPU. >> It just needs one fast thread and enough disk. >> >> wunder >> >> On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote: >> >>> I was afraid of that. Was hoping not to need another big fat box >>> like this one... >>> >>> --- >>> Alok K. Dhir >>> Symplicity Corporation >>> www.symplicity.com >>> (703) 351-0200 x 8080 >>> [EMAIL PROTECTED] >>> >>> On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote: >>> >>>> I believe this is one of the reasons that a master/slave >>>> configuration comes in handy. Commits to the Master don't slow down >>>> queries on the Slave. >>>> >>>> -Todd >>>> >>>> -----Original Message----- >>>> From: Alok Dhir [mailto:[EMAIL PROTECTED] >>>> Sent: Monday, November 03, 2008 1:47 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: SOLR Performance >>>> >>>> We've moved past this issue by reducing date precision -- thanks to >>>> all for the help. Now we're at another problem. >>>> >>>> There is relatively constant updating of the index -- new log >>>> entries are pumped in from several applications continuously. >>>> Obviously, new entries do not appear in searches until after a >>>> commit occurs. >>>> >>>> The problem is, issuing a commit causes searches to come to a >>>> screeching halt for up to 2 minutes. We're up to around 80M docs. >>>> Index size is 27G. The number of docs will soon be 800M, which >>>> doesn't bode well for these "pauses" in search performance. >>>> >>>> I'd appreciate any suggestions. >>>> >>>> --- >>>> Alok K. Dhir >>>> Symplicity Corporation >>>> www.symplicity.com >>>> (703) 351-0200 x 8080 >>>> [EMAIL PROTECTED] >>>> >>>> On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote: >>>> >>>>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core >>>>> machine. >>>>> >>>>> Fairly simple schema -- no large text fields, standard request >>>>> handler. 4 small facet fields. >>>>> >>>>> The index is an event log -- a primary search/retrieval >>>>> requirement is date range queries. >>>>> >>>>> A simple query without a date range subquery is ridiculously fast >>>>> - 2ms. The same query with a date range takes up to 30s >>>>> (30,000ms). >>>>> >>>>> Concrete example, this query just look 18s: >>>>> >>>>> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z >>>> TO >>>>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position" >>>>> >>>>> The exact same query without the date range took 2ms. >>>>> >>>>> I saw a thread from Apr 2008 which explains the problem being due >>>>> to too much precision on the DateField type, and the range >>>>> expansion leading to far too many elements being checked. >>>>> Proposed solution appears to be a hack where you index date fields >>>>> as strings and hacking together date functions to generate proper >>>>> queries/format results. >>>>> >>>>> Does this remain the recommended solution to this issue? >>>>> >>>>> Thanks >>>>> >>>>> --- >>>>> Alok K. Dhir >>>>> Symplicity Corporation >>>>> www.symplicity.com >>>>> (703) 351-0200 x 8080 >>>>> [EMAIL PROTECTED] >>>>> >>>> >>>> >>> >> >