Hello Shawn,

You mentioned shard handler tweaks, thanks. I see we have an incorrect setting 
there for maximumPoolSize, way too high, but that doesn't account for the 
number of threads created. After reducing the number, for dubious reasons, 
twice the number of threads are created and the node dies.

For a short time, there were two identical collections (just for different 
tests) on the nodes, i have removed one of them, but the number of threads 
created doesn't change one bit. So it appears shard handler config has nothing 
to do with it, or does it?

Regarding memory leaks, of course, the first that came to mind is that i made 
an error which only causes trouble on 7.3, but it is unreproducible so far, 
even if i fully replicate production in a test environment. Since it only leaks 
on commits, first suspect were URPs, and the URPs are the only things i can 
disable in production without affecting customers. Needless to say, it weren't 
the URPs.

But thanks anyway, whenever i have the courage again to tests it, i'll enable 
INFO logging, which is disabled. Maybe it will reveal something.

If anyone has even the weirdest unconventional suggestion on how to reproduce 
my production memory leak in a controlled test environment, let me know/

Thanks,
Markus
 
-----Original message-----
> From:Shawn Heisey <apa...@elyograg.org>
> Sent: Sunday 10th June 2018 22:42
> To: solr-user@lucene.apache.org
> Subject: Re: 7.3.1 creates thousands of threads after start up
> 
> On 6/8/2018 8:59 AM, Markus Jelsma wrote:
> > 2018-06-08 14:02:47.382 ERROR (qtp1458849419-1263) [   ] 
> > o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error 
> > trying to proxy request for url: http://idx2:8983/solr/
> > search/admin/ping
> <snip>
> > Caused by: org.eclipse.jetty.io.EofException
> 
> If you haven't tweaked the shard handler config to drastically reduce
> the socket timeout, that is weird.  The only thing that comes to mind is
> extreme GC pauses that cause the socket timeout to be exceeded.
> 
> > We operate three distinct type of Solr collections, they only share the 
> > same Zookeeper quorum. The other two collections do not seem to have this 
> > problem, but i don't restart those as often as i restart this collection, 
> > as i am STILL trying to REPRODUCE the dreaded memory leak i reported having 
> > on 7.3 about two weeks ago. Sorry, but i drives me nuts!
> 
> I've reviewed the list messages about the leak.  As you might imagine,
> my immediate thought is that the custom plugins you're running are
> probably the cause, because we are not getting OOME reports like I would
> expect if there were a leak in Solr itself.  It would not be unheard of
> for a custom plugin to experience no leaks with one Solr version but
> leak when Solr is upgraded, requiring a change in the plugin to properly
> close resources.  I do not know if that's what's happening.
> 
> A leak could lead to GC pause problems, but it does seem really odd for
> that to happen on a Solr node that's just been started.  You could try
> bumping the heap size by 25 to 50 percent and see if the behavior
> changes at all.  Honestly I don't expect it to change, and if it
> doesn't, then I do not know what the next troubleshooting step should
> be.  I could review your solr.log, though I can't be sure I would see
> something you didn't.
> 
> Thanks,
> Shawn
> 
> 

Reply via email to