Re: NRT and commit behavior

Erick Erickson Sat, 24 Sep 2011 12:06:45 -0700

No <G>. The problem is that "number of documents" isn't a reliable
indicator of resource consumption. Consider the difference between
indexing a twitter message and a book. I can put a LOT more docs
of 140 chars on a single machine of size X than I can books.


Unfortunately, the only way I know of is to test. Use something like
jMeter of SolrMeter to fire enough queries at your machine to
determine when you're over-straining resources and shard at that
point (or get a bigger machine <G>)..

Best
Erick

On Wed, Sep 21, 2011 at 8:24 PM, Tirthankar Chatterjee
<tchatter...@commvault.com> wrote:
> Okay, but is there any number that if we reach on the index size or total 
> docs in the index or the size of physical memory that sharding should be 
> considered.
>
> I am trying to find the winning combination.
> Tirthankar
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, September 16, 2011 7:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: NRT and commit behavior
>
> Uhm, you're putting  a lot of index into not very much memory. I really think 
> you're going to have to shard your index across several machines to get past 
> this problem. Simply increasing the size of your caches is still limited by 
> the physical memory you're working with.
>
> You really have to put a profiler on the system to see what's going on. At 
> that size there are too many things that it *could* be to definitively answer 
> it with e-mails....
>
> Best
> Erick
>
> On Wed, Sep 14, 2011 at 7:35 AM, Tirthankar Chatterjee 
> <tchatter...@commvault.com> wrote:
>> Erick,
>> Also, we had  our solrconfig where we have tried increasing the cache.... 
>> making the below value for autowarm count as 0 helps returning the commit 
>> call within the second, but that will slow us down on searches....
>>
>> <filterCache
>>      class="solr.FastLRUCache"
>>      size="16384"
>>      initialSize="4096"
>>      autowarmCount="4096"/>
>>
>>    <!-- Cache used to hold field values that are quickly accessible
>>         by document id.  The fieldValueCache is created by default
>>         even if not configured here.
>>      <fieldValueCache
>>        class="solr.FastLRUCache"
>>        size="512"
>>        autowarmCount="128"
>>        showItems="32"
>>      />
>>    -->
>>
>>   <!-- queryResultCache caches results of searches - ordered lists of
>>         document ids (DocList) based on a query, a sort, and the range
>>         of documents requested.  -->
>>    <queryResultCache
>>      class="solr.LRUCache"
>>      size="16384"
>>      initialSize="4096"
>>      autowarmCount="4096"/>
>>
>>  <!-- documentCache caches Lucene Document objects (the stored fields for 
>> each document).
>>       Since Lucene internal document ids are transient, this cache
>> will not be autowarmed.  -->
>>    <documentCache
>>      class="solr.LRUCache"
>>      size="512"
>>      initialSize="512"
>>      autowarmCount="512"/>
>>
>> -----Original Message-----
>> From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com]
>> Sent: Wednesday, September 14, 2011 7:31 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: NRT and commit behavior
>>
>> Erick,
>> Here is the answer to your questions:
>> Our index is 267 GB
>> We are not optimizing...
>> No we have not profiled yet to check the bottleneck, but logs indicate 
>> opening the searchers is taking time...
>> Nothing except SOLR
>> Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and
>> JVM and Tomcat
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Sunday, September 11, 2011 11:37 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: NRT and commit behavior
>>
>> Hmm, OK. You might want to look at the non-cached filter query stuff, it's 
>> quite recent.
>> The point here is that it is a filter that is applied only after all of the 
>> less expensive filter queries are run, One of its uses is exactly ACL 
>> calculations. Rather than calculate the ACL for the entire doc set, it only 
>> calculates access for docs that have made it past all the other elements of 
>> the query.... See SOLR-2429 and note that it is a 3.4 (currently being 
>> released) only.
>>
>> As to why your commits are taking so long, I have no idea given that you 
>> really haven't given us much to work with.
>>
>> How big is your index? Are you optimizing? Have you profiled the application 
>> to see what the bottleneck is (I/O, CPU, etc?). What else is running on your 
>> machine? It's quite surprising that it takes that long. How much memory are 
>> you giving the JVM? etc...
>>
>> You might want to review:
>> http://wiki.apache.org/solr/UsingMailingLists
>>
>> Best
>> Erick
>>
>>
>> On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee 
>> <tchatter...@commvault.com> wrote:
>>> Erick,
>>> What you said is correct for us the searches are based on some Active 
>>> Directory permissions which are populated in Filter query parameter. So we 
>>> don't have any warming query concept as we cannot fire for every user ahead 
>>> of time.
>>>
>>> What we do here is that when user logs in we do an invalid query(which 
>>> return no results instead of '*') with the correct filter query (which is 
>>> his permissions based on the login). This way the cache gets warmed up with 
>>> valid docs.
>>>
>>> It works then.
>>>
>>>
>>> Also, can you please let me know why commit is taking 45 mins to 1 hours on 
>>> a good resourced hardware with multiple processors and 16gb RAM 64 bit VM, 
>>> etc. We tried passing waitSearcher as false and found that inside the code 
>>> it hard coded to be true. Is there any specific reason. Can we change that 
>>> value to honor what is being passed.
>>>
>>> Thanks,
>>> Tirthankar
>>>
>>> -----Original Message-----
>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>>> Sent: Thursday, September 01, 2011 8:38 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: NRT and commit behavior
>>>
>>> Hmm, I'm guessing a bit here, but using an invalid query doesn't sound very 
>>> safe, but I suppose it *might* be OK.
>>>
>>> What does "invalid" mean? Syntax error? not safe.
>>>
>>> search that returns 0 results? I don't know, but I'd guess that
>>> filling your caches, which is the point of warming queries, might be
>>> short circuited if the query returns
>>> 0 results but I don't know for sure.
>>>
>>> But the fact that "invalid queries return quicker" does not inspire 
>>> confidence since the *point* of warming queries is to spend the time up 
>>> front so your users don't have to wait.
>>>
>>> So here's a test. Comment out your warming queries.
>>> Restart your server and fire the warming query from the browser 
>>> with&debugQuery=on and look at the QTime parameter.
>>>
>>> Now fire the same form of the query (as in the same sort, facet, grouping, 
>>> etc, but presumably a valid term). See the QTime.
>>>
>>> Now fire the same form of the query with a *different* value in the query. 
>>> That is, it should search on different terms but with the same sort, facet, 
>>> etc. to avoid getting your data straight from the queryResultCache.
>>>
>>> My guess is that the last query will return much more quickly than the 
>>> second query. Which would indicate that the first form isn't doing you any 
>>> good.
>>>
>>> But a test is worth a thousand opinions.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Aug 31, 2011 at 11:04 AM, Tirthankar Chatterjee 
>>> <tchatter...@commvault.com> wrote:
>>>> Also noticed that "waitSearcher" parameter value is not  honored inside 
>>>> commit. It is always defaulted to true which makes it slow during indexing.
>>>>
>>>> What we are trying to do is use an invalid query (which wont return any 
>>>> results) as a warming query. This way the commit returns faster. Are we 
>>>> doing something wrong here?
>>>>
>>>> Thanks,
>>>> Tirthankar
>>>>
>>>> -----Original Message-----
>>>> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
>>>> Sent: Monday, July 18, 2011 11:38 AM
>>>> To: solr-user@lucene.apache.org; yo...@lucidimagination.com
>>>> Subject: Re: NRT and commit behavior
>>>>
>>>> In practice, in my experience at least, a very 'expensive' commit
>>>> can still slow down searches significantly, I think just due to CPU
>>>> (or
>>>> i/o?) starvation. Not sure anything can be done about that.  That's my 
>>>> experience in Solr 1.4.1, but since searches have always been async with 
>>>> commits, it probably is the same situation even in more recent versions, 
>>>> I'd guess.
>>>>
>>>> On 7/18/2011 11:07 AM, Yonik Seeley wrote:
>>>>> On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chase<nch...@earthlink.net>  
>>>>> wrote:
>>>>>> Very glad to hear that NRT is finally here!  But my question is this:
>>>>>> will things still come to a standstill during a commit?
>>>>> New updates can now proceed in parallel with a commit, and searches
>>>>> have always been completely asynchronous w.r.t. commits.
>>>>>
>>>>> -Yonik
>>>>> http://www.lucidimagination.com
>>>>>
>>>> ******************Legal Disclaimer***************************
>>>> "This communication may contain confidential and privileged material
>>>> for the sole use of the intended recipient. Any unauthorized review,
>>>> use or distribution by others is strictly prohibited. If you have
>>>> received the message in error, please advise the sender by reply
>>>> email and delete the message. Thank you."
>>>> *********************************************************
>>>>
>>>
>>
>

Re: NRT and commit behavior

Reply via email to