The logistics of handling giant index files hit us before search
performance. We switched to a set of indexes running inside one server
(tomcat) instance with the Multicore+Distributed Search tools, with a frozen
old index and a new index actively taking updates. The smaller new index
takes much less time to recover after a commit.

The DS code does not handle cases where the new and old index have different
versions of the same document. We wrote a custom distributed search that
favored the "new" index over the "old".

Lance

-----Original Message-----
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 03, 2008 4:25 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR Performance

If you never execute any queries, a gig should be more than enough.

Of course, I've never played around with a .8 billion doc corpus on one
machine.

-Mike

On 3-Nov-08, at 2:16 PM, Alok Dhir wrote:

> in terms of RAM -- how to size that on the indexer?
>
> ---
> Alok K. Dhir
> Symplicity Corporation
> www.symplicity.com
> (703) 351-0200 x 8080
> [EMAIL PROTECTED]
>
> On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote:
>
>> The indexing box can be much smaller, especially in terms of CPU.
>> It just needs one fast thread and enough disk.
>>
>> wunder
>>
>> On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote:
>>
>>> I was afraid of that.  Was hoping not to need another big fat box 
>>> like this one...
>>>
>>> ---
>>> Alok K. Dhir
>>> Symplicity Corporation
>>> www.symplicity.com
>>> (703) 351-0200 x 8080
>>> [EMAIL PROTECTED]
>>>
>>> On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote:
>>>
>>>> I believe this is one of the reasons that a master/slave 
>>>> configuration comes in handy. Commits to the Master don't slow down 
>>>> queries on the Slave.
>>>>
>>>> -Todd
>>>>
>>>> -----Original Message-----
>>>> From: Alok Dhir [mailto:[EMAIL PROTECTED]
>>>> Sent: Monday, November 03, 2008 1:47 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: SOLR Performance
>>>>
>>>> We've moved past this issue by reducing date precision -- thanks to 
>>>> all for the help.  Now we're at another problem.
>>>>
>>>> There is relatively constant updating of the index -- new log 
>>>> entries are pumped in from several applications continuously.  
>>>> Obviously, new entries do not appear in searches until after a 
>>>> commit occurs.
>>>>
>>>> The problem is, issuing a commit causes searches to come to a 
>>>> screeching halt for up to 2 minutes.  We're up to around 80M docs.
>>>> Index size is 27G.  The number of docs will soon be 800M, which 
>>>> doesn't bode well for these "pauses" in search performance.
>>>>
>>>> I'd appreciate any suggestions.
>>>>
>>>> ---
>>>> Alok K. Dhir
>>>> Symplicity Corporation
>>>> www.symplicity.com
>>>> (703) 351-0200 x 8080
>>>> [EMAIL PROTECTED]
>>>>
>>>> On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote:
>>>>
>>>>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core 
>>>>> machine.
>>>>>
>>>>> Fairly simple schema -- no large text fields, standard request 
>>>>> handler.  4 small facet fields.
>>>>>
>>>>> The index is an event log -- a primary search/retrieval 
>>>>> requirement is date range queries.
>>>>>
>>>>> A simple query without a date range subquery is ridiculously fast 
>>>>> - 2ms.  The same query with a date range takes up to 30s 
>>>>> (30,000ms).
>>>>>
>>>>> Concrete example, this query just look 18s:
>>>>>
>>>>> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z
>>>> TO
>>>>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position"
>>>>>
>>>>> The exact same query without the date range took 2ms.
>>>>>
>>>>> I saw a thread from Apr 2008 which explains the problem being due 
>>>>> to too much precision on the DateField type, and the range 
>>>>> expansion leading to far too many elements being checked.  
>>>>> Proposed solution appears to be a hack where you index date fields 
>>>>> as strings and hacking together date functions to generate proper 
>>>>> queries/format results.
>>>>>
>>>>> Does this remain the recommended solution to this issue?
>>>>>
>>>>> Thanks
>>>>>
>>>>> ---
>>>>> Alok K. Dhir
>>>>> Symplicity Corporation
>>>>> www.symplicity.com
>>>>> (703) 351-0200 x 8080
>>>>> [EMAIL PROTECTED]
>>>>>
>>>>
>>>>
>>>
>>
>


Reply via email to