I created SOLR-12743 to track this. On Mon, Jul 16, 2018 at 12:30 PM Markus Jelsma <markus.jel...@openindex.io> wrote:
> Hello Thomas, > > To be absolutely sure you suffer from the same problem as one of our > collections, can you confirm that your Solr cores are leaking a > SolrIndexSearcher instance on each commit? If not, there may be a second > problem. > > Also, do you run any custom plugins or apply patches to your Solr > instances? Or is your Solr a 100 % official build? > > Thanks, > Markus > > > > -----Original message----- > > From:Thomas Scheffler <thomas.scheff...@uni-jena.de> > > Sent: Monday 16th July 2018 13:39 > > To: solr-user@lucene.apache.org > > Subject: Re: 7.3 appears to leak > > > > Hi, > > > > we noticed the same problems here in a rather small setup. 40.000 > metadata documents with nearly as much files that have „literal.*“ fields > with it. While 7.2.1 has brought some tika issues the real problems started > to appear with version 7.3.0 which are currently unresolved in 7.4.0. > Memory consumption is out-of-roof. Where previously 512MB heap was enough, > now 6G aren’t enough to index all files. > > > > kind regards, > > > > Thomas > > > > > Am 04.07.2018 um 15:03 schrieb Markus Jelsma < > markus.jel...@openindex.io>: > > > > > > Hello Andrey, > > > > > > I didn't think of that! I will try it when i have the courage again, > probably next week or so. > > > > > > Many thanks, > > > Markus > > > > > > > > > -----Original message----- > > >> From:Kydryavtsev Andrey <werde...@yandex.ru> > > >> Sent: Wednesday 4th July 2018 14:48 > > >> To: solr-user@lucene.apache.org > > >> Subject: Re: 7.3 appears to leak > > >> > > >> If it is not possible to find a resource leak by code analysis and > there is no better ideas, I can suggest a brute force approach: > > >> - Clone Solr's sources from appropriate branch > https://github.com/apache/lucene-solr/tree/branch_7_3 > > >> - Log every searcher's holder increment/decrement operation in a way > to catch every caller name (use Thread.currentThread().getStackTrace() or > something) > https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java > > >> - Build custom artefacts and upload them on prod > > >> - After memory leak happened - analyse logs to see what part of > functionality doesn't decrement searcher after counter was incremented. If > searchers are leaked - there should be such code I guess. > > >> > > >> This is not something someone would like to do, but it is what it is. > > >> > > >> > > >> > > >> Thank you, > > >> > > >> Andrey Kudryavtsev > > >> > > >> > > >> 03.07.2018, 14:26, "Markus Jelsma" <markus.jel...@openindex.io>: > > >>> Hello Erick, > > >>> > > >>> Even the silliest ideas may help us, but unfortunately this is not > the case. All our Solr nodes run binaries from the same source from our > central build server, with the same libraries thanks to provisioning. Only > schema and config are different, but the <lib/> directive is the same all > over. > > >>> > > >>> Are there any other ideas, speculations, whatever, on why only our > main text collection leaks a SolrIndexSearcher instance on commit since > 7.3.0 and every version up? > > >>> > > >>> Many thanks? > > >>> Markus > > >>> > > >>> -----Original message----- > > >>>> From:Erick Erickson <erickerick...@gmail.com> > > >>>> Sent: Friday 29th June 2018 19:34 > > >>>> To: solr-user <solr-user@lucene.apache.org> > > >>>> Subject: Re: 7.3 appears to leak > > >>>> > > >>>> This is truly puzzling then, I'm clueless. It's hard to imagine > this > > >>>> is lurking out there and nobody else notices, but you've eliminated > > >>>> the custom code. And this is also very peculiar: > > >>>> > > >>>> * it occurs only in our main text search collection, all other > > >>>> collections are unaffected; > > >>>> * despite what i said earlier, it is so far unreproducible outside > > >>>> production, even when mimicking production as good as we can; > > >>>> > > >>>> Here's a tedious idea. Restart Solr with the -v option, I _think_ > that > > >>>> shows you each and every jar file Solr loads. Is it "somehow" > possible > > >>>> that your main collection is loading some jar from somewhere that's > > >>>> different than you expect? 'cause silly ideas like this are all I > can > > >>>> come up with. > > >>>> > > >>>> Erick > > >>>> > > >>>> On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma > > >>>> <markus.jel...@openindex.io> wrote: > > >>>> > Hello Erick, > > >>>> > > > >>>> > The custom search handler doesn't interact with > SolrIndexSearcher, this is really all it does: > > >>>> > > > >>>> > public void handleRequestBody(SolrQueryRequest req, > SolrQueryResponse rsp) throws Exception { > > >>>> > super.handleRequestBody(req, rsp); > > >>>> > > > >>>> > if (rsp.getToLog().get("hits") instanceof Integer) { > > >>>> > rsp.addHttpHeader("X-Solr-Hits", > String.valueOf((Integer)rsp.getToLog().get("hits"))); > > >>>> > } > > >>>> > if (rsp.getToLog().get("hits") instanceof Long) { > > >>>> > rsp.addHttpHeader("X-Solr-Hits", > String.valueOf((Long)rsp.getToLog().get("hits"))); > > >>>> > } > > >>>> > } > > >>>> > > > >>>> > I am not sure this qualifies as one more to go. > > >>>> > > > >>>> > Re: compiler warnings on resources, yes! This and tests failing > due to resources leaks have always warned me when i forgot to release > something or decrement a reference. But except for the above method (and > the token filters which i really can't disable) are all that is left. > > >>>> > > > >>>> > I am quite desperate about this problem so although i am > unwilling to disable stuff, i can do it if i must. But i so reason, yet, to > remove the search handler or the token filter stuff, i mean, how could > those leak a SolrIndexSearcher? > > >>>> > > > >>>> > Let me know :) > > >>>> > > > >>>> > Many thanks! > > >>>> > Markus > > >>>> > > > >>>> > -----Original message----- > > >>>> >> From:Erick Erickson <erickerick...@gmail.com> > > >>>> >> Sent: Friday 29th June 2018 18:46 > > >>>> >> To: solr-user <solr-user@lucene.apache.org> > > >>>> >> Subject: Re: 7.3 appears to leak > > >>>> >> > > >>>> >> bq. The only custom stuff left is an extension of SearchHandler > that > > >>>> >> only writes numFound to the response headers. > > >>>> >> > > >>>> >> Well, one more to go ;). It's incredibly easy to overlook > > >>>> >> innocent-seeming calls that increment the underlying reference > count > > >>>> >> of some objects but don't decrement them, usually through a > close > > >>>> >> call. Which isn't necessarily a close if the underlying > reference > > >>>> >> count is still > 0. > > >>>> >> > > >>>> >> You may infer that I've been there and done that ;). Sometime > the > > >>>> >> compiler warnings about "resource leak" can help pinpoint those > too. > > >>>> >> > > >>>> >> Best, > > >>>> >> Erick > > >>>> >> > > >>>> >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma > > >>>> >> <markus.jel...@openindex.io> wrote: > > >>>> >> > Hello Yonik, > > >>>> >> > > > >>>> >> > I took one node of the 7.2.1 cluster out of the load balancer > so it would only receive shard queries, this way i could kind of 'safely' > disable our custom components one by one, while keeping functionality in > place by letting the other 7.2.1 nodes continue on with the full > configuration. > > >>>> >> > > > >>>> >> > I am now at a point where literally all custom components are > deleted or commented out in the config for the node running 7.4. The only > custom stuff left is an extension of SearchHandler that only writes > numFound to the response headers, and all the token filters in our schema. > > >>>> >> > > > >>>> >> > You were right, it was leaking exactly one SolrIndexSearcher > instance on each commit. But, with all our stuff gone, the leak is still > there! I triple checked it! Of course, the bastard is locally still not > reproducible. > > >>>> >> > > > >>>> >> > So, what is next? I have no clues left. > > >>>> >> > > > >>>> >> > Many, many thanks, > > >>>> >> > Markus > > >>>> >> > > > >>>> >> > -----Original message----- > > >>>> >> >> From:Markus Jelsma <markus.jel...@openindex.io> > > >>>> >> >> Sent: Thursday 28th June 2018 23:52 > > >>>> >> >> To: solr-user@lucene.apache.org > > >>>> >> >> Subject: RE: 7.3 appears to leak > > >>>> >> >> > > >>>> >> >> Hello Yonik, > > >>>> >> >> > > >>>> >> >> If leaking a whole SolrIndexSearcher would cause this > problem, then the only custom component would be our copy/paste-and-enhance > version of the elevator component, is the root of all problems. It is a > direct copy of the 7.2 source where only things like getAnalyzedQuery, the > ElevationObj and the loop over the map entries is changed. > > >>>> >> >> > > >>>> >> >> There are no changes to code related to the searcher. Other > component where we get a RefCount of searcher is used without issues, we > always decrement the reference after using it. But those components are not > in use in this collection. > > >>>> >> >> > > >>>> >> >> The source has changed a lot with 7.4 but we still use the > old code. I will investigate the component thoroughly, even revert to the > old 7.2 vanilla component for a brief period in production for one machine. > It may not be a problem if i don't let our load balancer access it > directly, so it only serves shard queries. > > >>>> >> >> > > >>>> >> >> I will get back to this topic tomorrow! > > >>>> >> >> > > >>>> >> >> Many thanks, > > >>>> >> >> Markus > > >>>> >> >> > > >>>> >> >> > > >>>> >> >> > > >>>> >> >> -----Original message----- > > >>>> >> >> > From:Yonik Seeley <ysee...@gmail.com> > > >>>> >> >> > Sent: Thursday 28th June 2018 23:30 > > >>>> >> >> > To: solr-user@lucene.apache.org > > >>>> >> >> > Subject: Re: 7.3 appears to leak > > >>>> >> >> > > > >>>> >> >> > > * SortedIntDocSet instances ánd > ConcurrentLRUCache$CacheEntry instances are both leaked on commit; > > >>>> >> >> > > > >>>> >> >> > If these are actually filterCache entries being leaked, it > stands to > > >>>> >> >> > reason that a whole searcher is being leaked somewhere. > > >>>> >> >> > > > >>>> >> >> > -Yonik > > >>>> >> >> > > > >>>> >> >> > > >>>> >> > > >> > > > > > > >