Hi,
> On 3. Sep 2018, at 22:18, Erick Erickson <erickerick...@gmail.com> wrote: > > Reducing to 10 won't be definitive, but if the problem gets better > it'll be a clue. > > How are you committing? Is it just based on the solrconfig settings or > do you have any clients submitting commit commands? Only through the auto commits, no manual committing from the application. > > One fat clue would be if, in your solr logs, you were getting any > warnings about "too many on deck searchers" (going from memory here, > exact wording may differ). That's an indication that your autowarm > times are taking longer than 20 seconds (your soft commit interval), > which would point to excessive autowarming being _part_ of the > problem. This assumes you're indexing steadily. I searched our logs and could not find any evidence for this. I searched for: - searchers - auto - warmup There was nothing about too many searchers. Which would mean they are actually leaking and not too many warming up right? > > Still, though, changing from 6.6 to 7x shouldn't be that much different. > > It's possible that you were running close to your heap limit with 6.6 > and a relatively small difference in heap usage with 7x threw you over > the tipping point, but that's just hand-waving on my part. > I really thought about this, but in our 6.6. times we had a lot of head from in the young generation and also very log gc timings. > And I'm guessing this is a prod system so experiments aren't tolerable… What do you have in mind? Increasing memory? Thats something we anyway have todo - if it helps. Our current setup is not very stable anyway, so we have some room for experiments. > > What you can measure. Starting with 6.4 there are about a zillion metrics, > try: http://host:port/solr/admin/metrics for the complete list and > pick and choose. > > Note that there are ways to cut down on how much is reported, I > suspect you'll be interested first in: > http://localhost:8983/solr/admin/metrics?prefix=SEARCHER > > https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html > Funny thing is that we tried to use the prometheus exporter for these metrics, but whenever we started it it killed our solr node immediately. I will try to look into these metrics, but looking at them yields no valuable results for me. All metrics are “fine”. Is there anything special you would take a look at? > These tend to be on a per-core (replica) basis so you may have to do > some aggregating. > > Good luck! Thank you very much :) Björn > Erick > On Mon, Sep 3, 2018 at 12:54 PM Markus Jelsma > <markus.jel...@openindex.io> wrote: >> >> Hello, >> >> Getting an OOM plus the fact you are having a lot of IndexSearcher instances >> rings a familiar bell. One of our collections has the same issue [1] when we >> attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom >> Solr code but had to keep our Lucene filters in the schema, the problem >> persisted. >> >> The odd thing, however, is that you appear to have the same problem, but not >> with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can you confirm >> the problem is not also in 7.3.0? >> >> You should see the instance count for IndexSearcher increase by one for each >> replica on each commit. >> >> Regards, >> Markus >> >> [1] http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html >> >> >> >> -----Original message----- >>> From:Erick Erickson <erickerick...@gmail.com> >>> Sent: Monday 3rd September 2018 20:49 >>> To: solr-user <solr-user@lucene.apache.org> >>> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0 >>> >>> I would expect at least 1 IndexSearcher per replica, how many total >>> replicas hosted in your JVM? >>> >>> Plus, if you're actively indexing, there may temporarily be 2 >>> IndexSearchers open while the new searcher warms. >>> >>> And there may be quite a few caches, at least queryResultCache and >>> filterCache and documentCache, one of each per replica and maybe two >>> (for queryResultCache and filterCache) if you have a background >>> searcher autowarming. >>> >>> At a glance, your autowarm counts are very high, so it may take some >>> time to autowarm leading to multiple IndexSearchers and caches open >>> per replica when you happen to hit a commit point. I usually start >>> with 16-20 as an autowarm count, the benefit decreases rapidly as you >>> increase the count. >>> >>> I'm not quite sure why it would be different in 7x .vs. 6x. How much >>> heap do you allocate to the JVM? And do you see similar heap dumps in >>> 6.6? >>> >>> Best, >>> Erick >>> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser <bjoernhaeu...@gmail.com> >>> wrote: >>>> >>>> Hello, >>>> >>>> we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each, >>>> 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are >>>> running Zookeeper 4.1.13. >>>> >>>> Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space >>>> exhaustion. After obtaining a heap dump it looks like that we have a lot >>>> of IndexSearchers open for our largest collection. >>>> >>>> The dump contains around ~60 IndexSearchers, and each containing around >>>> ~40mb heap. Another 500MB of heap is the fieldcache, which is expected in >>>> my opinion. >>>> >>>> The current config can be found here: >>>> https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 >>>> <https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844> >>>> >>>> Analyzing the heap dump eclipse MAT says this: >>>> >>>> Problem Suspect 1 >>>> >>>> 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by >>>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy >>>> 1.981.148.336 (38,26%) bytes. >>>> >>>> Biggest instances: >>>> >>>> • org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - >>>> 70.087.272 (1,35%) bytes. >>>> • org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - >>>> 65.678.264 (1,27%) bytes. >>>> • org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - >>>> 63.050.600 (1,22%) bytes. >>>> >>>> >>>> Problem Suspect 2 >>>> >>>> 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by >>>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy >>>> 1.373.110.208 (26,52%) bytes. >>>> >>>> >>>> Any help is appreciated. Thank you very much! >>>> Björn >>>