No <G>. The problem is that "number of documents" isn't a reliable indicator of resource consumption. Consider the difference between indexing a twitter message and a book. I can put a LOT more docs of 140 chars on a single machine of size X than I can books.
Unfortunately, the only way I know of is to test. Use something like jMeter of SolrMeter to fire enough queries at your machine to determine when you're over-straining resources and shard at that point (or get a bigger machine <G>).. Best Erick On Wed, Sep 21, 2011 at 8:24 PM, Tirthankar Chatterjee <tchatter...@commvault.com> wrote: > Okay, but is there any number that if we reach on the index size or total > docs in the index or the size of physical memory that sharding should be > considered. > > I am trying to find the winning combination. > Tirthankar > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, September 16, 2011 7:46 AM > To: solr-user@lucene.apache.org > Subject: Re: NRT and commit behavior > > Uhm, you're putting a lot of index into not very much memory. I really think > you're going to have to shard your index across several machines to get past > this problem. Simply increasing the size of your caches is still limited by > the physical memory you're working with. > > You really have to put a profiler on the system to see what's going on. At > that size there are too many things that it *could* be to definitively answer > it with e-mails.... > > Best > Erick > > On Wed, Sep 14, 2011 at 7:35 AM, Tirthankar Chatterjee > <tchatter...@commvault.com> wrote: >> Erick, >> Also, we had our solrconfig where we have tried increasing the cache.... >> making the below value for autowarm count as 0 helps returning the commit >> call within the second, but that will slow us down on searches.... >> >> <filterCache >> class="solr.FastLRUCache" >> size="16384" >> initialSize="4096" >> autowarmCount="4096"/> >> >> <!-- Cache used to hold field values that are quickly accessible >> by document id. The fieldValueCache is created by default >> even if not configured here. >> <fieldValueCache >> class="solr.FastLRUCache" >> size="512" >> autowarmCount="128" >> showItems="32" >> /> >> --> >> >> <!-- queryResultCache caches results of searches - ordered lists of >> document ids (DocList) based on a query, a sort, and the range >> of documents requested. --> >> <queryResultCache >> class="solr.LRUCache" >> size="16384" >> initialSize="4096" >> autowarmCount="4096"/> >> >> <!-- documentCache caches Lucene Document objects (the stored fields for >> each document). >> Since Lucene internal document ids are transient, this cache >> will not be autowarmed. --> >> <documentCache >> class="solr.LRUCache" >> size="512" >> initialSize="512" >> autowarmCount="512"/> >> >> -----Original Message----- >> From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com] >> Sent: Wednesday, September 14, 2011 7:31 AM >> To: solr-user@lucene.apache.org >> Subject: RE: NRT and commit behavior >> >> Erick, >> Here is the answer to your questions: >> Our index is 267 GB >> We are not optimizing... >> No we have not profiled yet to check the bottleneck, but logs indicate >> opening the searchers is taking time... >> Nothing except SOLR >> Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and >> JVM and Tomcat >> >> -----Original Message----- >> From: Erick Erickson [mailto:erickerick...@gmail.com] >> Sent: Sunday, September 11, 2011 11:37 AM >> To: solr-user@lucene.apache.org >> Subject: Re: NRT and commit behavior >> >> Hmm, OK. You might want to look at the non-cached filter query stuff, it's >> quite recent. >> The point here is that it is a filter that is applied only after all of the >> less expensive filter queries are run, One of its uses is exactly ACL >> calculations. Rather than calculate the ACL for the entire doc set, it only >> calculates access for docs that have made it past all the other elements of >> the query.... See SOLR-2429 and note that it is a 3.4 (currently being >> released) only. >> >> As to why your commits are taking so long, I have no idea given that you >> really haven't given us much to work with. >> >> How big is your index? Are you optimizing? Have you profiled the application >> to see what the bottleneck is (I/O, CPU, etc?). What else is running on your >> machine? It's quite surprising that it takes that long. How much memory are >> you giving the JVM? etc... >> >> You might want to review: >> http://wiki.apache.org/solr/UsingMailingLists >> >> Best >> Erick >> >> >> On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee >> <tchatter...@commvault.com> wrote: >>> Erick, >>> What you said is correct for us the searches are based on some Active >>> Directory permissions which are populated in Filter query parameter. So we >>> don't have any warming query concept as we cannot fire for every user ahead >>> of time. >>> >>> What we do here is that when user logs in we do an invalid query(which >>> return no results instead of '*') with the correct filter query (which is >>> his permissions based on the login). This way the cache gets warmed up with >>> valid docs. >>> >>> It works then. >>> >>> >>> Also, can you please let me know why commit is taking 45 mins to 1 hours on >>> a good resourced hardware with multiple processors and 16gb RAM 64 bit VM, >>> etc. We tried passing waitSearcher as false and found that inside the code >>> it hard coded to be true. Is there any specific reason. Can we change that >>> value to honor what is being passed. >>> >>> Thanks, >>> Tirthankar >>> >>> -----Original Message----- >>> From: Erick Erickson [mailto:erickerick...@gmail.com] >>> Sent: Thursday, September 01, 2011 8:38 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: NRT and commit behavior >>> >>> Hmm, I'm guessing a bit here, but using an invalid query doesn't sound very >>> safe, but I suppose it *might* be OK. >>> >>> What does "invalid" mean? Syntax error? not safe. >>> >>> search that returns 0 results? I don't know, but I'd guess that >>> filling your caches, which is the point of warming queries, might be >>> short circuited if the query returns >>> 0 results but I don't know for sure. >>> >>> But the fact that "invalid queries return quicker" does not inspire >>> confidence since the *point* of warming queries is to spend the time up >>> front so your users don't have to wait. >>> >>> So here's a test. Comment out your warming queries. >>> Restart your server and fire the warming query from the browser >>> with&debugQuery=on and look at the QTime parameter. >>> >>> Now fire the same form of the query (as in the same sort, facet, grouping, >>> etc, but presumably a valid term). See the QTime. >>> >>> Now fire the same form of the query with a *different* value in the query. >>> That is, it should search on different terms but with the same sort, facet, >>> etc. to avoid getting your data straight from the queryResultCache. >>> >>> My guess is that the last query will return much more quickly than the >>> second query. Which would indicate that the first form isn't doing you any >>> good. >>> >>> But a test is worth a thousand opinions. >>> >>> Best >>> Erick >>> >>> On Wed, Aug 31, 2011 at 11:04 AM, Tirthankar Chatterjee >>> <tchatter...@commvault.com> wrote: >>>> Also noticed that "waitSearcher" parameter value is not honored inside >>>> commit. It is always defaulted to true which makes it slow during indexing. >>>> >>>> What we are trying to do is use an invalid query (which wont return any >>>> results) as a warming query. This way the commit returns faster. Are we >>>> doing something wrong here? >>>> >>>> Thanks, >>>> Tirthankar >>>> >>>> -----Original Message----- >>>> From: Jonathan Rochkind [mailto:rochk...@jhu.edu] >>>> Sent: Monday, July 18, 2011 11:38 AM >>>> To: solr-user@lucene.apache.org; yo...@lucidimagination.com >>>> Subject: Re: NRT and commit behavior >>>> >>>> In practice, in my experience at least, a very 'expensive' commit >>>> can still slow down searches significantly, I think just due to CPU >>>> (or >>>> i/o?) starvation. Not sure anything can be done about that. That's my >>>> experience in Solr 1.4.1, but since searches have always been async with >>>> commits, it probably is the same situation even in more recent versions, >>>> I'd guess. >>>> >>>> On 7/18/2011 11:07 AM, Yonik Seeley wrote: >>>>> On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chase<nch...@earthlink.net> >>>>> wrote: >>>>>> Very glad to hear that NRT is finally here! But my question is this: >>>>>> will things still come to a standstill during a commit? >>>>> New updates can now proceed in parallel with a commit, and searches >>>>> have always been completely asynchronous w.r.t. commits. >>>>> >>>>> -Yonik >>>>> http://www.lucidimagination.com >>>>> >>>> ******************Legal Disclaimer*************************** >>>> "This communication may contain confidential and privileged material >>>> for the sole use of the intended recipient. Any unauthorized review, >>>> use or distribution by others is strictly prohibited. If you have >>>> received the message in error, please advise the sender by reply >>>> email and delete the message. Thank you." >>>> ********************************************************* >>>> >>> >> >