In reading the newer solrconfig in the example conf folder it seems like it is saying this setting ' <mergeFactor>10</mergeFactor>' is shorthand to putting the below and that these both are the defaults? It says 'The default since Solr/Lucene 3.3 is TieredMergePolicy.' So isn't this setting already in effect for me?
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicy> Thanks Robi -----Original Message----- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Monday, June 17, 2013 6:36 PM To: solr-user@lucene.apache.org Subject: Re: yet another optimize question Yes, in one of the example solrconfig.xml files this is right above the merge factor definition. Otis -- Solr & ElasticSearch Support -- http://sematext.com/ On Mon, Jun 17, 2013 at 8:00 PM, Petersen, Robert <robert.peter...@mail.rakuten.com> wrote: > Hi Upayavira, > > You might have gotten it. Yes we noticed maxdocs was way bigger than > numdocs. There were a lot of files ending in '.del' in the index folder > also. We started on 1.3 also. I don't currently have any solr config > settings for MergePolicy at all. Am I going to want to put something like > this into my index defaults section? > > <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> > <int name="maxMergeAtOnce">10</int> > <int name="segmentsPerTier">10</int> </mergePolicy> > > Thanks > Robi > > -----Original Message----- > From: Upayavira [mailto:u...@odoko.co.uk] > Sent: Monday, June 17, 2013 12:29 PM > To: solr-user@lucene.apache.org > Subject: Re: yet another optimize question > > The key figures are numdocs vs maxdocs. Maxdocs-numdocs is the number of > deleted docs in your index. > > This is a 3.6 system you say. But has it been upgraded? I've seen folks > who've upgraded from 1.4 or 3.0/3.1 over time, keeping the old config. > The consequence of this is that they don't get the right config for the > TieredMergePolicy, and therefore don't get to use it, seeing the old > behaviour which does require periodic optimise. > > Upayavira > > On Mon, Jun 17, 2013, at 07:21 PM, Petersen, Robert wrote: >> Hi Otis, >> >> Right I didn't restart the JVMs except on the one slave where I was >> experimenting with using G1GC on the 1.7.0_21 JRE. Also some time ago I >> made all our caches small enough to keep us from getting OOMs while still >> having a good hit rate. Our index has about 50 fields which are mostly >> int IDs and there are some dynamic fields also. These dynamic fields >> can be used for custom faceting. We have some standard facets we >> always facet on and other dynamic facets which are only used if the >> query is filtering on a particular category. There are hundreds of >> these fields but since they are only for a small subset of the >> overall index they are very sparsely populated with regard to the >> overall index. With CMS GC we get a sawtooth on the old generation >> (I guess every replication and commit causes it's usage to drop down >> to 10GB or >> so) and it seems to be the old generation which is the main space >> consumer. With the G1GC, the memory map looked totally different! I >> was a little lost looking at memory consumption with that GC. Maybe >> I'll try it again now that the index is a bit smaller than it was >> last time I tried it. After four days without running an optimize >> now it is 21GB. BTW our indexing speed is mostly bound by the DB so >> reducing the segments might be ok... >> >> Here is a quick snapshot of one slaves memory map as reported by >> PSI-Probe, but unfortunately I guess I can't send the history >> graphics to the solr-user list to show their changes over time: >> Name Used Committed Max >> Initial Group >> Par Survivor Space 20.02 MB 108.13 MB 108.13 MB >> 108.13 MB HEAP >> CMS Perm Gen 42.29 MB 70.66 MB 82.00 MB 20.75 >> MB NON_HEAP >> Code Cache 9.73 MB 9.88 MB 48.00 MB 2.44 MB >> NON_HEAP >> CMS Old Gen 20.22 GB 30.94 GB 30.94 GB >> 30.94 GB HEAP >> Par Eden Space 42.20 MB 865.31 MB 865.31 MB 865.31 >> MB HEAP >> Total 20.33 GB 31.97 GB 32.02 GB >> 31.92 GB TOTAL >> >> And here's our current cache stats from a random slave: >> >> name: queryResultCache >> class: org.apache.solr.search.LRUCache >> version: 1.0 >> description: LRU Cache(maxSize=488, initialSize=6, autowarmCount=6, >> regenerator=org.apache.solr.search.SolrIndexSearcher$3@461ff4c3) >> stats: lookups : 619 >> hits : 36 >> hitratio : 0.05 >> inserts : 592 >> evictions : 101 >> size : 488 >> warmupTime : 2949 >> cumulative_lookups : 681225 >> cumulative_hits : 73126 >> cumulative_hitratio : 0.10 >> cumulative_inserts : 602396 >> cumulative_evictions : 428868 >> >> >> name: fieldCache >> class: org.apache.solr.search.SolrFieldCacheMBean >> version: 1.0 >> description: Provides introspection of the Lucene FieldCache, this is >> **NOT** a cache that is managed by Solr. >> stats: entries_count : 359 >> >> >> name: documentCache >> class: org.apache.solr.search.LRUCache >> version: 1.0 >> description: LRU Cache(maxSize=2048, initialSize=512, >> autowarmCount=10, regenerator=null) >> stats: lookups : 12710 >> hits : 7160 >> hitratio : 0.56 >> inserts : 5636 >> evictions : 3588 >> size : 2048 >> warmupTime : 0 >> cumulative_lookups : 10590054 >> cumulative_hits : 6166913 >> cumulative_hitratio : 0.58 >> cumulative_inserts : 4423141 >> cumulative_evictions : 3714653 >> >> >> name: fieldValueCache >> class: org.apache.solr.search.FastLRUCache >> version: 1.0 >> description: Concurrent LRU Cache(maxSize=280, initialSize=280, >> minSize=252, acceptableSize=266, cleanupThread=false, >> autowarmCount=6, >> regenerator=org.apache.solr.search.SolrIndexSearcher$1@143eb77a) >> stats: lookups : 1725 >> hits : 1481 >> hitratio : 0.85 >> inserts : 122 >> evictions : 0 >> size : 128 >> warmupTime : 4426 >> cumulative_lookups : 3449712 >> cumulative_hits : 3281805 >> cumulative_hitratio : 0.95 >> cumulative_inserts : 83261 >> cumulative_evictions : 3479 >> >> >> name: filterCache >> class: org.apache.solr.search.FastLRUCache >> version: 1.0 >> description: Concurrent LRU Cache(maxSize=248, initialSize=12, >> minSize=223, acceptableSize=235, cleanupThread=false, >> autowarmCount=10, >> regenerator=org.apache.solr.search.SolrIndexSearcher$2@36e831d6) >> stats: lookups : 3990 >> hits : 3831 >> hitratio : 0.96 >> inserts : 239 >> evictions : 26 >> size : 244 >> warmupTime : 1 >> cumulative_lookups : 5745011 >> cumulative_hits : 5496150 >> cumulative_hitratio : 0.95 >> cumulative_inserts : 351485 >> cumulative_evictions : 276308 >> >> -----Original Message----- >> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] >> Sent: Saturday, June 15, 2013 5:52 AM >> To: solr-user@lucene.apache.org >> Subject: Re: yet another optimize question >> >> Hi Robi, >> >> I'm going to guess you are seeing smaller heap also simply because >> you restarted the JVM recently (hm, you don't say you restarted, >> maybe I'm making this up). If you are indeed indexing continuously >> then you shouldn't optimize. Lucene will merge segments itself. Lower >> mergeFactor will force it to do it more often (it means slower >> indexing, bigger IO hit when segments are merged, more per-segment >> data that Lucene/Solr need to read from the segment for faceting and >> such, etc.) so maybe you shouldn't mess with that. Do you know what >> your caches are like in terms of size, hit %, evictions? We've >> recently seen people set those to a few hundred K or even higher, >> which can eat a lot of heap. We have had luck with G1 recently, too. >> Maybe you can run jstat and see which of the memory pools get filled >> up and change/increase appropriate JVM param based on that? How many >> fields do you index, facet, or group on? >> >> Otis >> -- >> Performance Monitoring - http://sematext.com/spm/index.html >> Solr & ElasticSearch Support -- http://sematext.com/ >> >> >> >> >> >> On Fri, Jun 14, 2013 at 8:04 PM, Petersen, Robert >> <robert.peter...@mail.rakuten.com> wrote: >> > Hi guys, >> > >> > We're on solr 3.6.1 and I've read the discussions about whether to >> > optimize or not to optimize. I decided to try not optimizing our index as >> > was recommended. We have a little over 15 million docs in our biggest >> > index and a 32gb heap for our jvm. So without the optimizes the index >> > folder seemed to grow in size and quantity of files. There seemed to be >> > an upper limit but eventually it hit 300 files consuming 26gb of space and >> > that seemed to push our slave farm over the edge and we started getting >> > the dreaded OOMs. We have continuous indexing activity, so I stopped the >> > indexer and manually ran an optimize which made the index become 9 files >> > consuming 15gb of space and our slave farm started having acceptable >> > memory usage. Our merge factor is 10, we're on java 7. Before >> > optimizing, I tried on one slave machine to go with the latest JVM and >> > tried switching from the CMS GC to the G1GC but it hit OOM condition even >> > faster. So it seems like I have to continue to schedule a regular >> > optimize. Right now it has been a couple of days since running the >> > optimize and the index is slowly growing bigger, now up to a bit over >> > 19gb. What do you guys think? Did I miss something that would make us >> > able to run without doing an optimize? >> > >> > Robert (Robi) Petersen >> > Senior Software Engineer >> > Search Department >> >> > >