Re: yet another optimize question

Walter Underwood Wed, 19 Jun 2013 10:50:48 -0700

I generally run with an 8GB heap for a system that does no faceting. 32GB does 
seem rather large, but you really should have room for bigger caches.


The Akamai cache will reduce your hit rate a lot. That is OK, because users are 
getting faster responses than they would from Solr. A 5% hit rate may be OK 
since you have that front end HTTP cache.

The Netflix index was updated daily. 

wunder

On Jun 19, 2013, at 10:36 AM, Petersen, Robert wrote:

> Hi Walter,
> 
> I used to have larger settings on our caches but it seemed like I had to make 
> the caches that small to reduce memory usage to keep from getting the dreaded 
> OOM exceptions.  Also our search is behind Akamai with a one hour TTL.  Our 
> slave farm has a load balancer in front of twelve slave servers and our index 
> is being updated constantly, pretty much 24/7.  
> 
> So my question would be how do you run with such big caches without going 
> into the OOM zone?  Was the Netflix index only updated based upon the release 
> schedules of the studios, like once a week?  Our entertainment stores used to 
> be like that before we turned into a marketplace based e-tailer, but now we 
> get new listings from merchants all the time and so have a constant churn of 
> additions and deletions in our index.
> 
> I feel like at 32GB our heap is really huge, but we seem to use almost all of 
> it with these settings.   I am trying out the G1GC on one slave to see if 
> that gets memory usage lower but while it has a different collection pattern 
> in the various spaces it seems like the total memory usage peaks out at about 
> the same level.
> 
> Thanks
> Robi
> 
> -----Original Message-----
> From: Walter Underwood [mailto:[email protected]] 
> Sent: Tuesday, June 18, 2013 6:57 PM
> To: [email protected]
> Subject: Re: yet another optimize question
> 
> Your query cache is far too small. Most of the default caches are too small.
> 
> We run with 10K entries and get a hit rate around 0.30 across four servers. 
> This rate goes up with more queries, down with less, but try a bigger cache, 
> especially if you are updating the index infrequently, like once per day.
> 
> At Netflix, we had a 0.12 hit rate on the query cache, even with an HTTP 
> cache in front of it. The HTTP cache had an 80% hit rate.
> 
> I'd increase your document cache, too. I usually see about 0.75 or better on 
> that.
> 
> wunder
> 
> On Jun 18, 2013, at 10:22 AM, Petersen, Robert wrote:
> 
>> Hi Otis, 
>> 
>> Yes the query results cache is just about worthless.   I guess we have too 
>> diverse of a set of user queries.  The business unit has decided to let bots 
>> crawl our search pages too so that doesn't help either.  I turned it way 
>> down but decided to keep it because my understanding was that it would still 
>> help for users going from page 1 to page 2 in a search.  Is that true?
>> 
>> Thanks
>> Robi
>> 
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:[email protected]] 
>> Sent: Monday, June 17, 2013 6:39 PM
>> To: [email protected]
>> Subject: Re: yet another optimize question
>> 
>> Hi Robi,
>> 
>> This goes against the original problem of getting OOMEs, but it looks like 
>> each of your Solr caches could be a little bigger if you want to eliminate 
>> evictions, with the query results one possibly not being worth keeping if 
>> you can't get the hit % up enough.
>> 
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> 
>> 
>> On Mon, Jun 17, 2013 at 2:21 PM, Petersen, Robert 
>> <[email protected]> wrote:
>>> Hi Otis,
>>> 
>>> Right I didn't restart the JVMs except on the one slave where I was 
>>> experimenting with using G1GC on the 1.7.0_21 JRE.   Also some time ago I 
>>> made all our caches small enough to keep us from getting OOMs while still 
>>> having a good hit rate.    Our index has about 50 fields which are mostly 
>>> int IDs and there are some dynamic fields also.  These dynamic fields can 
>>> be used for custom faceting.  We have some standard facets we always facet 
>>> on and other dynamic facets which are only used if the query is filtering 
>>> on a particular category.  There are hundreds of these fields but since 
>>> they are only for a small subset of the overall index they are very 
>>> sparsely populated with regard to the overall index.  With CMS GC we get a 
>>> sawtooth on the old generation (I guess every replication and commit causes 
>>> it's usage to drop down to 10GB or so) and it seems to be the old 
>>> generation which is the main space consumer.  With the G1GC, the memory map 
>>> looked totally different!  I was a little lost looking at memory 
>>> consumption with that GC.  Maybe I'll try it again now that the index is a 
>>> bit smaller than it was last time I tried it.  After four days without 
>>> running an optimize now it is 21GB.  BTW our indexing speed is mostly bound 
>>> by the DB so reducing the segments might be ok...
>>> 
>>> Here is a quick snapshot of one slaves memory map as reported by PSI-Probe, 
>>> but unfortunately I guess I can't send the history graphics to the 
>>> solr-user list to show their changes over time:
>>>       Name                    Used            Committed       Max           
>>>   Initial         Group
>>>        Par Survivor Space     20.02 MB        108.13 MB       108.13 MB     
>>>   108.13 MB       HEAP
>>>        CMS Perm Gen   42.29 MB        70.66 MB        82.00 MB        20.75 
>>> MB        NON_HEAP
>>>        Code Cache             9.73 MB 9.88 MB 48.00 MB        2.44 MB 
>>> NON_HEAP
>>>        CMS Old Gen            20.22 GB        30.94 GB        30.94 GB      
>>>   30.94 GB        HEAP
>>>        Par Eden Space 42.20 MB        865.31 MB       865.31 MB       
>>> 865.31 MB       HEAP
>>>        Total                  20.33 GB        31.97 GB        32.02 GB      
>>>   31.92 GB        TOTAL
>>> 
>>> And here's our current cache stats from a random slave:
>>> 
>>> name:    queryResultCache
>>> class:   org.apache.solr.search.LRUCache
>>> version:         1.0
>>> description:     LRU Cache(maxSize=488, initialSize=6, autowarmCount=6, 
>>> regenerator=org.apache.solr.search.SolrIndexSearcher$3@461ff4c3)
>>> stats:  lookups : 619
>>> hits : 36
>>> hitratio : 0.05
>>> inserts : 592
>>> evictions : 101
>>> size : 488
>>> warmupTime : 2949
>>> cumulative_lookups : 681225
>>> cumulative_hits : 73126
>>> cumulative_hitratio : 0.10
>>> cumulative_inserts : 602396
>>> cumulative_evictions : 428868
>>> 
>>> 
>>> name:   fieldCache
>>> class:   org.apache.solr.search.SolrFieldCacheMBean
>>> version:         1.0
>>> description:     Provides introspection of the Lucene FieldCache, this is 
>>> **NOT** a cache that is managed by Solr.
>>> stats:  entries_count : 359
>>> 
>>> 
>>> name:    documentCache
>>> class:   org.apache.solr.search.LRUCache
>>> version:         1.0
>>> description:     LRU Cache(maxSize=2048, initialSize=512, autowarmCount=10, 
>>> regenerator=null)
>>> stats:  lookups : 12710
>>> hits : 7160
>>> hitratio : 0.56
>>> inserts : 5636
>>> evictions : 3588
>>> size : 2048
>>> warmupTime : 0
>>> cumulative_lookups : 10590054
>>> cumulative_hits : 6166913
>>> cumulative_hitratio : 0.58
>>> cumulative_inserts : 4423141
>>> cumulative_evictions : 3714653
>>> 
>>> 
>>> name:    fieldValueCache
>>> class:   org.apache.solr.search.FastLRUCache
>>> version:         1.0
>>> description:     Concurrent LRU Cache(maxSize=280, initialSize=280, 
>>> minSize=252, acceptableSize=266, cleanupThread=false, autowarmCount=6, 
>>> regenerator=org.apache.solr.search.SolrIndexSearcher$1@143eb77a)
>>> stats:  lookups : 1725
>>> hits : 1481
>>> hitratio : 0.85
>>> inserts : 122
>>> evictions : 0
>>> size : 128
>>> warmupTime : 4426
>>> cumulative_lookups : 3449712
>>> cumulative_hits : 3281805
>>> cumulative_hitratio : 0.95
>>> cumulative_inserts : 83261
>>> cumulative_evictions : 3479
>>> 
>>> 
>>> name:    filterCache
>>> class:   org.apache.solr.search.FastLRUCache
>>> version:         1.0
>>> description:     Concurrent LRU Cache(maxSize=248, initialSize=12, 
>>> minSize=223, acceptableSize=235, cleanupThread=false, autowarmCount=10, 
>>> regenerator=org.apache.solr.search.SolrIndexSearcher$2@36e831d6)
>>> stats:  lookups : 3990
>>> hits : 3831
>>> hitratio : 0.96
>>> inserts : 239
>>> evictions : 26
>>> size : 244
>>> warmupTime : 1
>>> cumulative_lookups : 5745011
>>> cumulative_hits : 5496150
>>> cumulative_hitratio : 0.95
>>> cumulative_inserts : 351485
>>> cumulative_evictions : 276308
>>> 
>>> -----Original Message-----
>>> From: Otis Gospodnetic [mailto:[email protected]]
>>> Sent: Saturday, June 15, 2013 5:52 AM
>>> To: [email protected]
>>> Subject: Re: yet another optimize question
>>> 
>>> Hi Robi,
>>> 
>>> I'm going to guess you are seeing smaller heap also simply because you 
>>> restarted the JVM recently (hm, you don't say you restarted, maybe I'm 
>>> making this up). If you are indeed indexing continuously then you shouldn't 
>>> optimize. Lucene will merge segments itself. Lower mergeFactor will force 
>>> it to do it more often (it means slower indexing, bigger IO hit when 
>>> segments are merged, more per-segment data that Lucene/Solr need to read 
>>> from the segment for faceting and such, etc.) so maybe you shouldn't mess 
>>> with that.  Do you know what your caches are like in terms of size, hit %, 
>>> evictions?  We've recently seen people set those to a few hundred K or even 
>>> higher, which can eat a lot of heap.  We have had luck with G1 recently, 
>>> too.
>>> Maybe you can run jstat and see which of the memory pools get filled up and 
>>> change/increase appropriate JVM param based on that?  How many fields do 
>>> you index, facet, or group on?
>>> 
>>> Otis
>>> --
>>> Performance Monitoring - http://sematext.com/spm/index.html
>>> Solr & ElasticSearch Support -- http://sematext.com/
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Jun 14, 2013 at 8:04 PM, Petersen, Robert 
>>> <[email protected]> wrote:
>>>> Hi guys,
>>>> 
>>>> We're on solr 3.6.1 and I've read the discussions about whether to 
>>>> optimize or not to optimize.  I decided to try not optimizing our index as 
>>>> was recommended.  We have a little over 15 million docs in our biggest 
>>>> index and a 32gb heap for our jvm.  So without the optimizes the index 
>>>> folder seemed to grow in size and quantity of files.  There seemed to be 
>>>> an upper limit but eventually it hit 300 files consuming 26gb of space and 
>>>> that seemed to push our slave farm over the edge and we started getting 
>>>> the dreaded OOMs.  We have continuous indexing activity, so I stopped the 
>>>> indexer and manually ran an optimize which made the index become 9 files 
>>>> consuming 15gb of space and our slave farm started having acceptable 
>>>> memory usage.  Our merge factor is 10, we're on java 7.  Before 
>>>> optimizing, I tried on one slave machine to go with the latest JVM and 
>>>> tried switching from the CMS GC to the G1GC but it hit OOM condition even 
>>>> faster.  So it seems like I have to continue to schedule a regular 
>>>> optimize.  Right now it has been a couple of days since running the 
>>>> optimize and the index is slowly growing bigger, now up to a bit over 
>>>> 19gb.  What do you guys think?  Did I miss something that would make us 
>>>> able to run without doing an optimize?
>>>> 
>>>> Robert (Robi) Petersen
>>>> Senior Software Engineer
>>>> Search Department
>>> 
>>> 
>> 
>> 
> 
> --
> Walter Underwood
> [email protected]
> 
> 
> 
> 

--
Walter Underwood
[email protected]

Re: yet another optimize question

Reply via email to