Hello Andrey, I didn't think of that! I will try it when i have the courage again, probably next week or so.
Many thanks, Markus -----Original message----- > From:Kydryavtsev Andrey <werde...@yandex.ru> > Sent: Wednesday 4th July 2018 14:48 > To: solr-user@lucene.apache.org > Subject: Re: 7.3 appears to leak > > If it is not possible to find a resource leak by code analysis and there is > no better ideas, I can suggest a brute force approach: > - Clone Solr's sources from appropriate branch > https://github.com/apache/lucene-solr/tree/branch_7_3 > - Log every searcher's holder increment/decrement operation in a way to catch > every caller name (use Thread.currentThread().getStackTrace() or something) > https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java > - Build custom artefacts and upload them on prod > - After memory leak happened - analyse logs to see what part of functionality > doesn't decrement searcher after counter was incremented. If searchers are > leaked - there should be such code I guess. > > This is not something someone would like to do, but it is what it is. > > > > Thank you, > > Andrey Kudryavtsev > > > 03.07.2018, 14:26, "Markus Jelsma" <markus.jel...@openindex.io>: > > Hello Erick, > > > > Even the silliest ideas may help us, but unfortunately this is not the > > case. All our Solr nodes run binaries from the same source from our central > > build server, with the same libraries thanks to provisioning. Only schema > > and config are different, but the <lib/> directive is the same all over. > > > > Are there any other ideas, speculations, whatever, on why only our main > > text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 > > and every version up? > > > > Many thanks? > > Markus > > > > -----Original message----- > >> From:Erick Erickson <erickerick...@gmail.com> > >> Sent: Friday 29th June 2018 19:34 > >> To: solr-user <solr-user@lucene.apache.org> > >> Subject: Re: 7.3 appears to leak > >> > >> This is truly puzzling then, I'm clueless. It's hard to imagine this > >> is lurking out there and nobody else notices, but you've eliminated > >> the custom code. And this is also very peculiar: > >> > >> * it occurs only in our main text search collection, all other > >> collections are unaffected; > >> * despite what i said earlier, it is so far unreproducible outside > >> production, even when mimicking production as good as we can; > >> > >> Here's a tedious idea. Restart Solr with the -v option, I _think_ that > >> shows you each and every jar file Solr loads. Is it "somehow" possible > >> that your main collection is loading some jar from somewhere that's > >> different than you expect? 'cause silly ideas like this are all I can > >> come up with. > >> > >> Erick > >> > >> On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma > >> <markus.jel...@openindex.io> wrote: > >> > Hello Erick, > >> > > >> > The custom search handler doesn't interact with SolrIndexSearcher, this > >> is really all it does: > >> > > >> > public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse > >> rsp) throws Exception { > >> > super.handleRequestBody(req, rsp); > >> > > >> > if (rsp.getToLog().get("hits") instanceof Integer) { > >> > rsp.addHttpHeader("X-Solr-Hits", > >> String.valueOf((Integer)rsp.getToLog().get("hits"))); > >> > } > >> > if (rsp.getToLog().get("hits") instanceof Long) { > >> > rsp.addHttpHeader("X-Solr-Hits", > >> String.valueOf((Long)rsp.getToLog().get("hits"))); > >> > } > >> > } > >> > > >> > I am not sure this qualifies as one more to go. > >> > > >> > Re: compiler warnings on resources, yes! This and tests failing due to > >> resources leaks have always warned me when i forgot to release something > >> or decrement a reference. But except for the above method (and the token > >> filters which i really can't disable) are all that is left. > >> > > >> > I am quite desperate about this problem so although i am unwilling to > >> disable stuff, i can do it if i must. But i so reason, yet, to remove the > >> search handler or the token filter stuff, i mean, how could those leak a > >> SolrIndexSearcher? > >> > > >> > Let me know :) > >> > > >> > Many thanks! > >> > Markus > >> > > >> > -----Original message----- > >> >> From:Erick Erickson <erickerick...@gmail.com> > >> >> Sent: Friday 29th June 2018 18:46 > >> >> To: solr-user <solr-user@lucene.apache.org> > >> >> Subject: Re: 7.3 appears to leak > >> >> > >> >> bq. The only custom stuff left is an extension of SearchHandler that > >> >> only writes numFound to the response headers. > >> >> > >> >> Well, one more to go ;). It's incredibly easy to overlook > >> >> innocent-seeming calls that increment the underlying reference count > >> >> of some objects but don't decrement them, usually through a close > >> >> call. Which isn't necessarily a close if the underlying reference > >> >> count is still > 0. > >> >> > >> >> You may infer that I've been there and done that ;). Sometime the > >> >> compiler warnings about "resource leak" can help pinpoint those too. > >> >> > >> >> Best, > >> >> Erick > >> >> > >> >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma > >> >> <markus.jel...@openindex.io> wrote: > >> >> > Hello Yonik, > >> >> > > >> >> > I took one node of the 7.2.1 cluster out of the load balancer so it > >> would only receive shard queries, this way i could kind of 'safely' > >> disable our custom components one by one, while keeping functionality in > >> place by letting the other 7.2.1 nodes continue on with the full > >> configuration. > >> >> > > >> >> > I am now at a point where literally all custom components are > >> deleted or commented out in the config for the node running 7.4. The only > >> custom stuff left is an extension of SearchHandler that only writes > >> numFound to the response headers, and all the token filters in our schema. > >> >> > > >> >> > You were right, it was leaking exactly one SolrIndexSearcher > >> instance on each commit. But, with all our stuff gone, the leak is still > >> there! I triple checked it! Of course, the bastard is locally still not > >> reproducible. > >> >> > > >> >> > So, what is next? I have no clues left. > >> >> > > >> >> > Many, many thanks, > >> >> > Markus > >> >> > > >> >> > -----Original message----- > >> >> >> From:Markus Jelsma <markus.jel...@openindex.io> > >> >> >> Sent: Thursday 28th June 2018 23:52 > >> >> >> To: solr-user@lucene.apache.org > >> >> >> Subject: RE: 7.3 appears to leak > >> >> >> > >> >> >> Hello Yonik, > >> >> >> > >> >> >> If leaking a whole SolrIndexSearcher would cause this problem, then > >> the only custom component would be our copy/paste-and-enhance version of > >> the elevator component, is the root of all problems. It is a direct copy > >> of the 7.2 source where only things like getAnalyzedQuery, the > >> ElevationObj and the loop over the map entries is changed. > >> >> >> > >> >> >> There are no changes to code related to the searcher. Other > >> component where we get a RefCount of searcher is used without issues, we > >> always decrement the reference after using it. But those components are > >> not in use in this collection. > >> >> >> > >> >> >> The source has changed a lot with 7.4 but we still use the old > >> code. I will investigate the component thoroughly, even revert to the old > >> 7.2 vanilla component for a brief period in production for one machine. It > >> may not be a problem if i don't let our load balancer access it directly, > >> so it only serves shard queries. > >> >> >> > >> >> >> I will get back to this topic tomorrow! > >> >> >> > >> >> >> Many thanks, > >> >> >> Markus > >> >> >> > >> >> >> > >> >> >> > >> >> >> -----Original message----- > >> >> >> > From:Yonik Seeley <ysee...@gmail.com> > >> >> >> > Sent: Thursday 28th June 2018 23:30 > >> >> >> > To: solr-user@lucene.apache.org > >> >> >> > Subject: Re: 7.3 appears to leak > >> >> >> > > >> >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry > >> instances are both leaked on commit; > >> >> >> > > >> >> >> > If these are actually filterCache entries being leaked, it stands > >> to > >> >> >> > reason that a whole searcher is being leaked somewhere. > >> >> >> > > >> >> >> > -Yonik > >> >> >> > > >> >> >> > >> >> >