I generally run with an 8GB heap for a system that does no faceting. 32GB does seem rather large, but you really should have room for bigger caches.
The Akamai cache will reduce your hit rate a lot. That is OK, because users are getting faster responses than they would from Solr. A 5% hit rate may be OK since you have that front end HTTP cache. The Netflix index was updated daily. wunder On Jun 19, 2013, at 10:36 AM, Petersen, Robert wrote: > Hi Walter, > > I used to have larger settings on our caches but it seemed like I had to make > the caches that small to reduce memory usage to keep from getting the dreaded > OOM exceptions. Also our search is behind Akamai with a one hour TTL. Our > slave farm has a load balancer in front of twelve slave servers and our index > is being updated constantly, pretty much 24/7. > > So my question would be how do you run with such big caches without going > into the OOM zone? Was the Netflix index only updated based upon the release > schedules of the studios, like once a week? Our entertainment stores used to > be like that before we turned into a marketplace based e-tailer, but now we > get new listings from merchants all the time and so have a constant churn of > additions and deletions in our index. > > I feel like at 32GB our heap is really huge, but we seem to use almost all of > it with these settings. I am trying out the G1GC on one slave to see if > that gets memory usage lower but while it has a different collection pattern > in the various spaces it seems like the total memory usage peaks out at about > the same level. > > Thanks > Robi > > -----Original Message----- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Tuesday, June 18, 2013 6:57 PM > To: solr-user@lucene.apache.org > Subject: Re: yet another optimize question > > Your query cache is far too small. Most of the default caches are too small. > > We run with 10K entries and get a hit rate around 0.30 across four servers. > This rate goes up with more queries, down with less, but try a bigger cache, > especially if you are updating the index infrequently, like once per day. > > At Netflix, we had a 0.12 hit rate on the query cache, even with an HTTP > cache in front of it. The HTTP cache had an 80% hit rate. > > I'd increase your document cache, too. I usually see about 0.75 or better on > that. > > wunder > > On Jun 18, 2013, at 10:22 AM, Petersen, Robert wrote: > >> Hi Otis, >> >> Yes the query results cache is just about worthless. I guess we have too >> diverse of a set of user queries. The business unit has decided to let bots >> crawl our search pages too so that doesn't help either. I turned it way >> down but decided to keep it because my understanding was that it would still >> help for users going from page 1 to page 2 in a search. Is that true? >> >> Thanks >> Robi >> >> -----Original Message----- >> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] >> Sent: Monday, June 17, 2013 6:39 PM >> To: solr-user@lucene.apache.org >> Subject: Re: yet another optimize question >> >> Hi Robi, >> >> This goes against the original problem of getting OOMEs, but it looks like >> each of your Solr caches could be a little bigger if you want to eliminate >> evictions, with the query results one possibly not being worth keeping if >> you can't get the hit % up enough. >> >> Otis >> -- >> Solr & ElasticSearch Support -- http://sematext.com/ >> >> >> On Mon, Jun 17, 2013 at 2:21 PM, Petersen, Robert >> <robert.peter...@mail.rakuten.com> wrote: >>> Hi Otis, >>> >>> Right I didn't restart the JVMs except on the one slave where I was >>> experimenting with using G1GC on the 1.7.0_21 JRE. Also some time ago I >>> made all our caches small enough to keep us from getting OOMs while still >>> having a good hit rate. Our index has about 50 fields which are mostly >>> int IDs and there are some dynamic fields also. These dynamic fields can >>> be used for custom faceting. We have some standard facets we always facet >>> on and other dynamic facets which are only used if the query is filtering >>> on a particular category. There are hundreds of these fields but since >>> they are only for a small subset of the overall index they are very >>> sparsely populated with regard to the overall index. With CMS GC we get a >>> sawtooth on the old generation (I guess every replication and commit causes >>> it's usage to drop down to 10GB or so) and it seems to be the old >>> generation which is the main space consumer. With the G1GC, the memory map >>> looked totally different! I was a little lost looking at memory >>> consumption with that GC. Maybe I'll try it again now that the index is a >>> bit smaller than it was last time I tried it. After four days without >>> running an optimize now it is 21GB. BTW our indexing speed is mostly bound >>> by the DB so reducing the segments might be ok... >>> >>> Here is a quick snapshot of one slaves memory map as reported by PSI-Probe, >>> but unfortunately I guess I can't send the history graphics to the >>> solr-user list to show their changes over time: >>> Name Used Committed Max >>> Initial Group >>> Par Survivor Space 20.02 MB 108.13 MB 108.13 MB >>> 108.13 MB HEAP >>> CMS Perm Gen 42.29 MB 70.66 MB 82.00 MB 20.75 >>> MB NON_HEAP >>> Code Cache 9.73 MB 9.88 MB 48.00 MB 2.44 MB >>> NON_HEAP >>> CMS Old Gen 20.22 GB 30.94 GB 30.94 GB >>> 30.94 GB HEAP >>> Par Eden Space 42.20 MB 865.31 MB 865.31 MB >>> 865.31 MB HEAP >>> Total 20.33 GB 31.97 GB 32.02 GB >>> 31.92 GB TOTAL >>> >>> And here's our current cache stats from a random slave: >>> >>> name: queryResultCache >>> class: org.apache.solr.search.LRUCache >>> version: 1.0 >>> description: LRU Cache(maxSize=488, initialSize=6, autowarmCount=6, >>> regenerator=org.apache.solr.search.SolrIndexSearcher$3@461ff4c3) >>> stats: lookups : 619 >>> hits : 36 >>> hitratio : 0.05 >>> inserts : 592 >>> evictions : 101 >>> size : 488 >>> warmupTime : 2949 >>> cumulative_lookups : 681225 >>> cumulative_hits : 73126 >>> cumulative_hitratio : 0.10 >>> cumulative_inserts : 602396 >>> cumulative_evictions : 428868 >>> >>> >>> name: fieldCache >>> class: org.apache.solr.search.SolrFieldCacheMBean >>> version: 1.0 >>> description: Provides introspection of the Lucene FieldCache, this is >>> **NOT** a cache that is managed by Solr. >>> stats: entries_count : 359 >>> >>> >>> name: documentCache >>> class: org.apache.solr.search.LRUCache >>> version: 1.0 >>> description: LRU Cache(maxSize=2048, initialSize=512, autowarmCount=10, >>> regenerator=null) >>> stats: lookups : 12710 >>> hits : 7160 >>> hitratio : 0.56 >>> inserts : 5636 >>> evictions : 3588 >>> size : 2048 >>> warmupTime : 0 >>> cumulative_lookups : 10590054 >>> cumulative_hits : 6166913 >>> cumulative_hitratio : 0.58 >>> cumulative_inserts : 4423141 >>> cumulative_evictions : 3714653 >>> >>> >>> name: fieldValueCache >>> class: org.apache.solr.search.FastLRUCache >>> version: 1.0 >>> description: Concurrent LRU Cache(maxSize=280, initialSize=280, >>> minSize=252, acceptableSize=266, cleanupThread=false, autowarmCount=6, >>> regenerator=org.apache.solr.search.SolrIndexSearcher$1@143eb77a) >>> stats: lookups : 1725 >>> hits : 1481 >>> hitratio : 0.85 >>> inserts : 122 >>> evictions : 0 >>> size : 128 >>> warmupTime : 4426 >>> cumulative_lookups : 3449712 >>> cumulative_hits : 3281805 >>> cumulative_hitratio : 0.95 >>> cumulative_inserts : 83261 >>> cumulative_evictions : 3479 >>> >>> >>> name: filterCache >>> class: org.apache.solr.search.FastLRUCache >>> version: 1.0 >>> description: Concurrent LRU Cache(maxSize=248, initialSize=12, >>> minSize=223, acceptableSize=235, cleanupThread=false, autowarmCount=10, >>> regenerator=org.apache.solr.search.SolrIndexSearcher$2@36e831d6) >>> stats: lookups : 3990 >>> hits : 3831 >>> hitratio : 0.96 >>> inserts : 239 >>> evictions : 26 >>> size : 244 >>> warmupTime : 1 >>> cumulative_lookups : 5745011 >>> cumulative_hits : 5496150 >>> cumulative_hitratio : 0.95 >>> cumulative_inserts : 351485 >>> cumulative_evictions : 276308 >>> >>> -----Original Message----- >>> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] >>> Sent: Saturday, June 15, 2013 5:52 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: yet another optimize question >>> >>> Hi Robi, >>> >>> I'm going to guess you are seeing smaller heap also simply because you >>> restarted the JVM recently (hm, you don't say you restarted, maybe I'm >>> making this up). If you are indeed indexing continuously then you shouldn't >>> optimize. Lucene will merge segments itself. Lower mergeFactor will force >>> it to do it more often (it means slower indexing, bigger IO hit when >>> segments are merged, more per-segment data that Lucene/Solr need to read >>> from the segment for faceting and such, etc.) so maybe you shouldn't mess >>> with that. Do you know what your caches are like in terms of size, hit %, >>> evictions? We've recently seen people set those to a few hundred K or even >>> higher, which can eat a lot of heap. We have had luck with G1 recently, >>> too. >>> Maybe you can run jstat and see which of the memory pools get filled up and >>> change/increase appropriate JVM param based on that? How many fields do >>> you index, facet, or group on? >>> >>> Otis >>> -- >>> Performance Monitoring - http://sematext.com/spm/index.html >>> Solr & ElasticSearch Support -- http://sematext.com/ >>> >>> >>> >>> >>> >>> On Fri, Jun 14, 2013 at 8:04 PM, Petersen, Robert >>> <robert.peter...@mail.rakuten.com> wrote: >>>> Hi guys, >>>> >>>> We're on solr 3.6.1 and I've read the discussions about whether to >>>> optimize or not to optimize. I decided to try not optimizing our index as >>>> was recommended. We have a little over 15 million docs in our biggest >>>> index and a 32gb heap for our jvm. So without the optimizes the index >>>> folder seemed to grow in size and quantity of files. There seemed to be >>>> an upper limit but eventually it hit 300 files consuming 26gb of space and >>>> that seemed to push our slave farm over the edge and we started getting >>>> the dreaded OOMs. We have continuous indexing activity, so I stopped the >>>> indexer and manually ran an optimize which made the index become 9 files >>>> consuming 15gb of space and our slave farm started having acceptable >>>> memory usage. Our merge factor is 10, we're on java 7. Before >>>> optimizing, I tried on one slave machine to go with the latest JVM and >>>> tried switching from the CMS GC to the G1GC but it hit OOM condition even >>>> faster. So it seems like I have to continue to schedule a regular >>>> optimize. Right now it has been a couple of days since running the >>>> optimize and the index is slowly growing bigger, now up to a bit over >>>> 19gb. What do you guys think? Did I miss something that would make us >>>> able to run without doing an optimize? >>>> >>>> Robert (Robi) Petersen >>>> Senior Software Engineer >>>> Search Department >>> >>> >> >> > > -- > Walter Underwood > wun...@wunderwood.org > > > > -- Walter Underwood wun...@wunderwood.org