If it is not possible to find a resource leak by code analysis and there is no 
better ideas, I can suggest a brute force approach:
- Clone Solr's sources from appropriate branch 
https://github.com/apache/lucene-solr/tree/branch_7_3
- Log every searcher's holder increment/decrement operation in a way to catch 
every caller name (use Thread.currentThread().getStackTrace() or something) 
https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
- Build custom artefacts and upload them on prod
- After memory leak happened - analyse logs to see what part of functionality 
doesn't decrement searcher after counter was incremented. If searchers are 
leaked - there should be such code I guess.

This is not something someone would like to do, but it is what it is.



Thank you,

Andrey Kudryavtsev


03.07.2018, 14:26, "Markus Jelsma" <markus.jel...@openindex.io>:
> Hello Erick,
>
> Even the silliest ideas may help us, but unfortunately this is not the case. 
> All our Solr nodes run binaries from the same source from our central build 
> server, with the same libraries thanks to provisioning. Only schema and 
> config are different, but the <lib/> directive is the same all over.
>
> Are there any other ideas, speculations, whatever, on why only our main text 
> collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every 
> version up?
>
> Many thanks?
> Markus
>
> -----Original message-----
>>  From:Erick Erickson <erickerick...@gmail.com>
>>  Sent: Friday 29th June 2018 19:34
>>  To: solr-user <solr-user@lucene.apache.org>
>>  Subject: Re: 7.3 appears to leak
>>
>>  This is truly puzzling then, I'm clueless. It's hard to imagine this
>>  is lurking out there and nobody else notices, but you've eliminated
>>  the custom code. And this is also very peculiar:
>>
>>  * it occurs only in our main text search collection, all other
>>  collections are unaffected;
>>  * despite what i said earlier, it is so far unreproducible outside
>>  production, even when mimicking production as good as we can;
>>
>>  Here's a tedious idea. Restart Solr with the -v option, I _think_ that
>>  shows you each and every jar file Solr loads. Is it "somehow" possible
>>  that your main collection is loading some jar from somewhere that's
>>  different than you expect? 'cause silly ideas like this are all I can
>>  come up with.
>>
>>  Erick
>>
>>  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
>>  <markus.jel...@openindex.io> wrote:
>>  > Hello Erick,
>>  >
>>  > The custom search handler doesn't interact with SolrIndexSearcher, this 
>> is really all it does:
>>  >
>>  >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse 
>> rsp) throws Exception {
>>  >     super.handleRequestBody(req, rsp);
>>  >
>>  >     if (rsp.getToLog().get("hits") instanceof Integer) {
>>  >       rsp.addHttpHeader("X-Solr-Hits", 
>> String.valueOf((Integer)rsp.getToLog().get("hits")));
>>  >     }
>>  >     if (rsp.getToLog().get("hits") instanceof Long) {
>>  >       rsp.addHttpHeader("X-Solr-Hits", 
>> String.valueOf((Long)rsp.getToLog().get("hits")));
>>  >     }
>>  >   }
>>  >
>>  > I am not sure this qualifies as one more to go.
>>  >
>>  > Re: compiler warnings on resources, yes! This and tests failing due to 
>> resources leaks have always warned me when i forgot to release something or 
>> decrement a reference. But except for the above method (and the token 
>> filters which i really can't disable) are all that is left.
>>  >
>>  > I am quite desperate about this problem so although i am unwilling to 
>> disable stuff, i can do it if i must. But i so reason, yet, to remove the 
>> search handler or the token filter stuff, i mean, how could those leak a 
>> SolrIndexSearcher?
>>  >
>>  > Let me know :)
>>  >
>>  > Many thanks!
>>  > Markus
>>  >
>>  > -----Original message-----
>>  >> From:Erick Erickson <erickerick...@gmail.com>
>>  >> Sent: Friday 29th June 2018 18:46
>>  >> To: solr-user <solr-user@lucene.apache.org>
>>  >> Subject: Re: 7.3 appears to leak
>>  >>
>>  >> bq. The only custom stuff left is an extension of SearchHandler that
>>  >> only writes numFound to the response headers.
>>  >>
>>  >> Well, one more to go ;). It's incredibly easy to overlook
>>  >> innocent-seeming calls that increment the underlying reference count
>>  >> of some objects but don't decrement them, usually through a close
>>  >> call. Which isn't necessarily a close if the underlying reference
>>  >> count is still > 0.
>>  >>
>>  >> You may infer that I've been there and done that ;). Sometime the
>>  >> compiler warnings about "resource leak" can help pinpoint those too.
>>  >>
>>  >> Best,
>>  >> Erick
>>  >>
>>  >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
>>  >> <markus.jel...@openindex.io> wrote:
>>  >> > Hello Yonik,
>>  >> >
>>  >> > I took one node of the 7.2.1 cluster out of the load balancer so it 
>> would only receive shard queries, this way i could kind of 'safely' disable 
>> our custom components one by one, while keeping functionality in place by 
>> letting the other 7.2.1 nodes continue on with the full configuration.
>>  >> >
>>  >> > I am now at a point where literally all custom components are deleted 
>> or commented out in the config for the node running 7.4. The only custom 
>> stuff left is an extension of SearchHandler that only writes numFound to the 
>> response headers, and all the token filters in our schema.
>>  >> >
>>  >> > You were right, it was leaking exactly one SolrIndexSearcher instance 
>> on each commit. But, with all our stuff gone, the leak is still there! I 
>> triple checked it! Of course, the bastard is locally still not reproducible.
>>  >> >
>>  >> > So, what is next? I have no clues left.
>>  >> >
>>  >> > Many, many thanks,
>>  >> > Markus
>>  >> >
>>  >> > -----Original message-----
>>  >> >> From:Markus Jelsma <markus.jel...@openindex.io>
>>  >> >> Sent: Thursday 28th June 2018 23:52
>>  >> >> To: solr-user@lucene.apache.org
>>  >> >> Subject: RE: 7.3 appears to leak
>>  >> >>
>>  >> >> Hello Yonik,
>>  >> >>
>>  >> >> If leaking a whole SolrIndexSearcher would cause this problem, then 
>> the only custom component would be our copy/paste-and-enhance version of the 
>> elevator component, is the root of all problems. It is a direct copy of the 
>> 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the 
>> loop over the map entries is changed.
>>  >> >>
>>  >> >> There are no changes to code related to the searcher. Other component 
>> where we get a RefCount of searcher is used without issues, we always 
>> decrement the reference after using it. But those components are not in use 
>> in this collection.
>>  >> >>
>>  >> >> The source has changed a lot with 7.4 but we still use the old code. 
>> I will investigate the component thoroughly, even revert to the old 7.2 
>> vanilla component for a brief period in production for one machine. It may 
>> not be a problem if i don't let our load balancer access it directly, so it 
>> only serves shard queries.
>>  >> >>
>>  >> >> I will get back to this topic tomorrow!
>>  >> >>
>>  >> >> Many thanks,
>>  >> >> Markus
>>  >> >>
>>  >> >>
>>  >> >>
>>  >> >> -----Original message-----
>>  >> >> > From:Yonik Seeley <ysee...@gmail.com>
>>  >> >> > Sent: Thursday 28th June 2018 23:30
>>  >> >> > To: solr-user@lucene.apache.org
>>  >> >> > Subject: Re: 7.3 appears to leak
>>  >> >> >
>>  >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry 
>> instances are both leaked on commit;
>>  >> >> >
>>  >> >> > If these are actually filterCache entries being leaked, it stands to
>>  >> >> > reason that a whole searcher is being leaked somewhere.
>>  >> >> >
>>  >> >> > -Yonik
>>  >> >> >
>>  >> >>
>>  >>

Reply via email to