Reducing to 10 won't be definitive, but if the problem gets better
it'll be a clue.

How are you committing? Is it just based on the solrconfig settings or
do you have any clients submitting commit commands?

One fat clue would be if, in your solr logs, you were getting any
warnings about "too many on deck searchers" (going from memory here,
exact wording may differ). That's an indication that your autowarm
times are taking longer than 20 seconds (your soft commit interval),
which would point to excessive autowarming being _part_ of the
problem. This assumes you're indexing steadily.

Still, though, changing from 6.6 to 7x shouldn't be that much different.

It's possible that you were running close to your heap limit with 6.6
and a relatively small difference in heap usage with 7x threw you over
the tipping point, but that's just hand-waving on my part.

And I'm guessing this is a prod system so experiments aren't tolerable...

What you can measure. Starting with 6.4 there are about a zillion metrics,
try: http://host:port/solr/admin/metrics for the complete list and
pick and choose.

Note that there are ways to cut down on how much is reported, I
suspect you'll be interested first in:
http://localhost:8983/solr/admin/metrics?prefix=SEARCHER

https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html

These tend to be on a per-core (replica) basis so you may have to do
some aggregating.

Good luck!
Erick
On Mon, Sep 3, 2018 at 12:54 PM Markus Jelsma
<markus.jel...@openindex.io> wrote:
>
> Hello,
>
> Getting an OOM plus the fact you are having a lot of IndexSearcher instances 
> rings a familiar bell. One of our collections has the same issue [1] when we 
> attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom Solr 
> code but had to keep our Lucene filters in the schema, the problem persisted.
>
> The odd thing, however, is that you appear to have the same problem, but not 
> with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can you confirm 
> the problem is not also in 7.3.0?
>
> You should see the instance count for IndexSearcher increase by one for each 
> replica on each commit.
>
> Regards,
> Markus
>
> [1] http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html
>
>
>
> -----Original message-----
> > From:Erick Erickson <erickerick...@gmail.com>
> > Sent: Monday 3rd September 2018 20:49
> > To: solr-user <solr-user@lucene.apache.org>
> > Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> >
> > I would expect at least 1 IndexSearcher per replica, how many total
> > replicas hosted in your JVM?
> >
> > Plus, if you're actively indexing, there may temporarily be 2
> > IndexSearchers open while the new searcher warms.
> >
> > And there may be quite a few caches, at least queryResultCache and
> > filterCache and documentCache, one of each per replica and maybe two
> > (for queryResultCache and filterCache) if you have a background
> > searcher autowarming.
> >
> > At a glance, your autowarm counts are very high, so it may take some
> > time to autowarm leading to multiple IndexSearchers and caches open
> > per replica when you happen to hit a commit point. I usually start
> > with 16-20 as an autowarm count, the benefit decreases rapidly as you
> > increase the count.
> >
> > I'm not quite sure why it would be different in 7x .vs. 6x. How much
> > heap do you allocate to the JVM? And do you see similar heap dumps in
> > 6.6?
> >
> > Best,
> > Erick
> > On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser <bjoernhaeu...@gmail.com> 
> > wrote:
> > >
> > > Hello,
> > >
> > > we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard 
> > > each, 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We 
> > > are running Zookeeper 4.1.13.
> > >
> > > Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
> > > exhaustion. After obtaining a heap dump it looks like that we have a lot 
> > > of IndexSearchers open for our largest collection.
> > >
> > > The dump contains around ~60 IndexSearchers, and each containing around 
> > > ~40mb heap. Another 500MB of heap is the fieldcache, which is expected in 
> > > my opinion.
> > >
> > > The current config can be found here: 
> > > https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 
> > > <https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844>
> > >
> > > Analyzing the heap dump eclipse MAT says this:
> > >
> > > Problem Suspect 1
> > >
> > > 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by 
> > > "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
> > > 1.981.148.336 (38,26%) bytes.
> > >
> > > Biggest instances:
> > >
> > >         • org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 
> > > 70.087.272 (1,35%) bytes.
> > >         • org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 
> > > 65.678.264 (1,27%) bytes.
> > >         • org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 
> > > 63.050.600 (1,22%) bytes.
> > >
> > >
> > > Problem Suspect 2
> > >
> > > 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by 
> > > "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
> > > 1.373.110.208 (26,52%) bytes.
> > >
> > >
> > > Any help is appreciated. Thank you very much!
> > > Björn
> >

Reply via email to