If it is not possible to find a resource leak by code analysis and there is no better ideas, I can suggest a brute force approach: - Clone Solr's sources from appropriate branch https://github.com/apache/lucene-solr/tree/branch_7_3 - Log every searcher's holder increment/decrement operation in a way to catch every caller name (use Thread.currentThread().getStackTrace() or something) https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java - Build custom artefacts and upload them on prod - After memory leak happened - analyse logs to see what part of functionality doesn't decrement searcher after counter was incremented. If searchers are leaked - there should be such code I guess.
This is not something someone would like to do, but it is what it is. Thank you, Andrey Kudryavtsev 03.07.2018, 14:26, "Markus Jelsma" <markus.jel...@openindex.io>: > Hello Erick, > > Even the silliest ideas may help us, but unfortunately this is not the case. > All our Solr nodes run binaries from the same source from our central build > server, with the same libraries thanks to provisioning. Only schema and > config are different, but the <lib/> directive is the same all over. > > Are there any other ideas, speculations, whatever, on why only our main text > collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every > version up? > > Many thanks? > Markus > > -----Original message----- >> From:Erick Erickson <erickerick...@gmail.com> >> Sent: Friday 29th June 2018 19:34 >> To: solr-user <solr-user@lucene.apache.org> >> Subject: Re: 7.3 appears to leak >> >> This is truly puzzling then, I'm clueless. It's hard to imagine this >> is lurking out there and nobody else notices, but you've eliminated >> the custom code. And this is also very peculiar: >> >> * it occurs only in our main text search collection, all other >> collections are unaffected; >> * despite what i said earlier, it is so far unreproducible outside >> production, even when mimicking production as good as we can; >> >> Here's a tedious idea. Restart Solr with the -v option, I _think_ that >> shows you each and every jar file Solr loads. Is it "somehow" possible >> that your main collection is loading some jar from somewhere that's >> different than you expect? 'cause silly ideas like this are all I can >> come up with. >> >> Erick >> >> On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma >> <markus.jel...@openindex.io> wrote: >> > Hello Erick, >> > >> > The custom search handler doesn't interact with SolrIndexSearcher, this >> is really all it does: >> > >> > public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse >> rsp) throws Exception { >> > super.handleRequestBody(req, rsp); >> > >> > if (rsp.getToLog().get("hits") instanceof Integer) { >> > rsp.addHttpHeader("X-Solr-Hits", >> String.valueOf((Integer)rsp.getToLog().get("hits"))); >> > } >> > if (rsp.getToLog().get("hits") instanceof Long) { >> > rsp.addHttpHeader("X-Solr-Hits", >> String.valueOf((Long)rsp.getToLog().get("hits"))); >> > } >> > } >> > >> > I am not sure this qualifies as one more to go. >> > >> > Re: compiler warnings on resources, yes! This and tests failing due to >> resources leaks have always warned me when i forgot to release something or >> decrement a reference. But except for the above method (and the token >> filters which i really can't disable) are all that is left. >> > >> > I am quite desperate about this problem so although i am unwilling to >> disable stuff, i can do it if i must. But i so reason, yet, to remove the >> search handler or the token filter stuff, i mean, how could those leak a >> SolrIndexSearcher? >> > >> > Let me know :) >> > >> > Many thanks! >> > Markus >> > >> > -----Original message----- >> >> From:Erick Erickson <erickerick...@gmail.com> >> >> Sent: Friday 29th June 2018 18:46 >> >> To: solr-user <solr-user@lucene.apache.org> >> >> Subject: Re: 7.3 appears to leak >> >> >> >> bq. The only custom stuff left is an extension of SearchHandler that >> >> only writes numFound to the response headers. >> >> >> >> Well, one more to go ;). It's incredibly easy to overlook >> >> innocent-seeming calls that increment the underlying reference count >> >> of some objects but don't decrement them, usually through a close >> >> call. Which isn't necessarily a close if the underlying reference >> >> count is still > 0. >> >> >> >> You may infer that I've been there and done that ;). Sometime the >> >> compiler warnings about "resource leak" can help pinpoint those too. >> >> >> >> Best, >> >> Erick >> >> >> >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma >> >> <markus.jel...@openindex.io> wrote: >> >> > Hello Yonik, >> >> > >> >> > I took one node of the 7.2.1 cluster out of the load balancer so it >> would only receive shard queries, this way i could kind of 'safely' disable >> our custom components one by one, while keeping functionality in place by >> letting the other 7.2.1 nodes continue on with the full configuration. >> >> > >> >> > I am now at a point where literally all custom components are deleted >> or commented out in the config for the node running 7.4. The only custom >> stuff left is an extension of SearchHandler that only writes numFound to the >> response headers, and all the token filters in our schema. >> >> > >> >> > You were right, it was leaking exactly one SolrIndexSearcher instance >> on each commit. But, with all our stuff gone, the leak is still there! I >> triple checked it! Of course, the bastard is locally still not reproducible. >> >> > >> >> > So, what is next? I have no clues left. >> >> > >> >> > Many, many thanks, >> >> > Markus >> >> > >> >> > -----Original message----- >> >> >> From:Markus Jelsma <markus.jel...@openindex.io> >> >> >> Sent: Thursday 28th June 2018 23:52 >> >> >> To: solr-user@lucene.apache.org >> >> >> Subject: RE: 7.3 appears to leak >> >> >> >> >> >> Hello Yonik, >> >> >> >> >> >> If leaking a whole SolrIndexSearcher would cause this problem, then >> the only custom component would be our copy/paste-and-enhance version of the >> elevator component, is the root of all problems. It is a direct copy of the >> 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the >> loop over the map entries is changed. >> >> >> >> >> >> There are no changes to code related to the searcher. Other component >> where we get a RefCount of searcher is used without issues, we always >> decrement the reference after using it. But those components are not in use >> in this collection. >> >> >> >> >> >> The source has changed a lot with 7.4 but we still use the old code. >> I will investigate the component thoroughly, even revert to the old 7.2 >> vanilla component for a brief period in production for one machine. It may >> not be a problem if i don't let our load balancer access it directly, so it >> only serves shard queries. >> >> >> >> >> >> I will get back to this topic tomorrow! >> >> >> >> >> >> Many thanks, >> >> >> Markus >> >> >> >> >> >> >> >> >> >> >> >> -----Original message----- >> >> >> > From:Yonik Seeley <ysee...@gmail.com> >> >> >> > Sent: Thursday 28th June 2018 23:30 >> >> >> > To: solr-user@lucene.apache.org >> >> >> > Subject: Re: 7.3 appears to leak >> >> >> > >> >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry >> instances are both leaked on commit; >> >> >> > >> >> >> > If these are actually filterCache entries being leaked, it stands to >> >> >> > reason that a whole searcher is being leaked somewhere. >> >> >> > >> >> >> > -Yonik >> >> >> > >> >> >> >> >>