Hi,

> On 3. Sep 2018, at 22:18, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Reducing to 10 won't be definitive, but if the problem gets better
> it'll be a clue.
> 
> How are you committing? Is it just based on the solrconfig settings or
> do you have any clients submitting commit commands?

Only through the auto commits, no manual committing from the application.

> 
> One fat clue would be if, in your solr logs, you were getting any
> warnings about "too many on deck searchers" (going from memory here,
> exact wording may differ). That's an indication that your autowarm
> times are taking longer than 20 seconds (your soft commit interval),
> which would point to excessive autowarming being _part_ of the
> problem. This assumes you're indexing steadily.

I searched our logs and could not find any evidence for this. I searched for:

- searchers
- auto
- warmup

There was nothing about too many searchers. Which would mean they are actually 
leaking and not too many warming up right?

> 
> Still, though, changing from 6.6 to 7x shouldn't be that much different.
> 
> It's possible that you were running close to your heap limit with 6.6
> and a relatively small difference in heap usage with 7x threw you over
> the tipping point, but that's just hand-waving on my part.
> 

I really thought about this, but in our 6.6. times we had a lot of head from in 
the young generation and also very log gc timings.


> And I'm guessing this is a prod system so experiments aren't tolerable…

What do you have in mind? Increasing memory? Thats something we anyway have 
todo - if it helps.
Our current setup is not very stable anyway, so we have some room for 
experiments.

> 
> What you can measure. Starting with 6.4 there are about a zillion metrics,
> try: http://host:port/solr/admin/metrics for the complete list and
> pick and choose.
> 
> Note that there are ways to cut down on how much is reported, I
> suspect you'll be interested first in:
> http://localhost:8983/solr/admin/metrics?prefix=SEARCHER
> 
> https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html
> 

Funny thing is that we tried to use the prometheus exporter for these metrics, 
but whenever we started it it killed our solr node immediately. 

I will try to look into these metrics, but looking at them yields no valuable 
results for me. All metrics are “fine”. 

Is there anything special you would take a look at?

> These tend to be on a per-core (replica) basis so you may have to do
> some aggregating.
> 
> Good luck!


Thank you very much :)
Björn

> Erick
> On Mon, Sep 3, 2018 at 12:54 PM Markus Jelsma
> <markus.jel...@openindex.io> wrote:
>> 
>> Hello,
>> 
>> Getting an OOM plus the fact you are having a lot of IndexSearcher instances 
>> rings a familiar bell. One of our collections has the same issue [1] when we 
>> attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom 
>> Solr code but had to keep our Lucene filters in the schema, the problem 
>> persisted.
>> 
>> The odd thing, however, is that you appear to have the same problem, but not 
>> with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can you confirm 
>> the problem is not also in 7.3.0?
>> 
>> You should see the instance count for IndexSearcher increase by one for each 
>> replica on each commit.
>> 
>> Regards,
>> Markus
>> 
>> [1] http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html
>> 
>> 
>> 
>> -----Original message-----
>>> From:Erick Erickson <erickerick...@gmail.com>
>>> Sent: Monday 3rd September 2018 20:49
>>> To: solr-user <solr-user@lucene.apache.org>
>>> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
>>> 
>>> I would expect at least 1 IndexSearcher per replica, how many total
>>> replicas hosted in your JVM?
>>> 
>>> Plus, if you're actively indexing, there may temporarily be 2
>>> IndexSearchers open while the new searcher warms.
>>> 
>>> And there may be quite a few caches, at least queryResultCache and
>>> filterCache and documentCache, one of each per replica and maybe two
>>> (for queryResultCache and filterCache) if you have a background
>>> searcher autowarming.
>>> 
>>> At a glance, your autowarm counts are very high, so it may take some
>>> time to autowarm leading to multiple IndexSearchers and caches open
>>> per replica when you happen to hit a commit point. I usually start
>>> with 16-20 as an autowarm count, the benefit decreases rapidly as you
>>> increase the count.
>>> 
>>> I'm not quite sure why it would be different in 7x .vs. 6x. How much
>>> heap do you allocate to the JVM? And do you see similar heap dumps in
>>> 6.6?
>>> 
>>> Best,
>>> Erick
>>> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser <bjoernhaeu...@gmail.com> 
>>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each, 
>>>> 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are 
>>>> running Zookeeper 4.1.13.
>>>> 
>>>> Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
>>>> exhaustion. After obtaining a heap dump it looks like that we have a lot 
>>>> of IndexSearchers open for our largest collection.
>>>> 
>>>> The dump contains around ~60 IndexSearchers, and each containing around 
>>>> ~40mb heap. Another 500MB of heap is the fieldcache, which is expected in 
>>>> my opinion.
>>>> 
>>>> The current config can be found here: 
>>>> https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 
>>>> <https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844>
>>>> 
>>>> Analyzing the heap dump eclipse MAT says this:
>>>> 
>>>> Problem Suspect 1
>>>> 
>>>> 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by 
>>>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
>>>> 1.981.148.336 (38,26%) bytes.
>>>> 
>>>> Biggest instances:
>>>> 
>>>>        • org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 
>>>> 70.087.272 (1,35%) bytes.
>>>>        • org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 
>>>> 65.678.264 (1,27%) bytes.
>>>>        • org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 
>>>> 63.050.600 (1,22%) bytes.
>>>> 
>>>> 
>>>> Problem Suspect 2
>>>> 
>>>> 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by 
>>>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
>>>> 1.373.110.208 (26,52%) bytes.
>>>> 
>>>> 
>>>> Any help is appreciated. Thank you very much!
>>>> Björn
>>> 

Reply via email to