Re: Possible memory leaks with frequent replication

Lance Norskog Wed, 03 Nov 2010 16:09:48 -0700

Do you use EmbeddedSolr in the query server? There is a memory leak
that shows up when taking a lot of replications.


On Wed, Nov 3, 2010 at 8:28 AM, Jonathan Rochkind <rochk...@jhu.edu> wrote:
> Ah, but reading Peter's email message I reference more carefully, it seems
> that Solr already DOES provide an info-level log warning you about
> over-lapping warming, awesome. (But again, I'm pretty sure it does NOT throw
> or HTTP error in that condition, based on my and others experience).
>
>
>> To check if your Solr environment is suffering from this, turn on INFO
>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping
>> onDeckSearchers=x'.
>
> Sweet, good to know, and I'll definitely add this to my debugging toolbox.
> Peter's listserv message really ought to be a wiki page, I think.  Any
> reason for me not to just add it as a new one with title "Commit frequency
> and auto-warming" or something like that?  Unless it's already in the wiki
> somewhere I haven't found, assuming the wiki will let an ordinary
> user-created account add a new page.
> //
> Jonathan Rochkind wrote:
>>
>> I hadn't looked at the code, am not familiar with Solr code, and can't say
>> what that code does.
>>
>> But I have experienced issues that I _believe_ were caused by too frequent
>> commits causing over-lapping searcher preperation. And I've definitely seen
>> Solr documentation that suggests this is an issue. Let me find it now to see
>> if the experts think these documented suggests are still correct or not:
>>
>> "On the other hand, autowarming (populating) a new collection could take a
>> lot of time, especially since it uses only one thread and one CPU. If your
>> settings fire off snapinstaller too frequently, then a Solr slave could be
>> in the undesirable condition of handing-off queries to one (old) collection,
>> and, while warming a new collection, a second “new” one could be snapped and
>> begin warming!
>>
>> If we attempted to solve such a situation, we would have to invalidate the
>> first “new” collection in order to use the second one, then when a “third”
>> new collection would be snapped and warmed, we would have to invalidate the
>> “second” new collection, and so on ad infinitum. A completely warmed
>> collection would never make it to full term before it was aborted. This can
>> be prevented with a properly tuned configuration so new collections do not
>> get installed too rapidly. "
>>
>>
>> http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs
>>
>> I think I've seen that same advice on another wiki page without being
>> specifically regarding replication, but just being about commit frequency
>> balanced with auto-warming, leading to overlapping warming, leading to
>> spiraling RAM/CPU usage -- but NOT an exception being thrown or HTTP error
>> delivered.
>>
>> I can't find it on the wiki, but here's a listserv post with someone
>> reporting findings that match my understanding:
>> http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00528.html
>>
>> How does this advice square with the code Lance found?  Is my
>> understanding of how frequent commits can interact with time it takes to
>> warm a new collection correct? Appreciate any additional info.
>>
>>
>>
>>
>> Lance Norskog wrote:
>>
>>>
>>> Isn't that what this code does?
>>>
>>>      onDeckSearchers++;
>>>      if (onDeckSearchers < 1) {
>>>        // should never happen... just a sanity check
>>>        log.error(logid+"ERROR!!! onDeckSearchers is " + onDeckSearchers);
>>>        onDeckSearchers=1;  // reset
>>>      } else if (onDeckSearchers > maxWarmingSearchers) {
>>>        onDeckSearchers--;
>>>        String msg="Error opening new searcher. exceeded limit of
>>> maxWarmingSearchers="+maxWarmingSearchers + ", try again later.";
>>>        log.warn(logid+""+ msg);
>>>        // HTTP 503==service unavailable, or 409==Conflict
>>>        throw new
>>> SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true);
>>>      } else if (onDeckSearchers > 1) {
>>>        log.info(logid+"PERFORMANCE WARNING: Overlapping
>>> onDeckSearchers=" + onDeckSearchers);
>>>      }
>>>
>>>
>>> On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind <rochk...@jhu.edu>
>>> wrote:
>>>
>>>>
>>>> It's definitely a known 'issue' that you can't replicate (or do any
>>>> other
>>>> kind of index change, including a commit) at a faster frequency than
>>>> your
>>>> warming queries take to complete, or you'll wind up with something like
>>>> you've seen.
>>>>
>>>> It's in some documentation somewhere I saw, for sure.
>>>>
>>>> The advice to 'just query against the master' is kind of odd, because,
>>>> then... why have a slave at all, if you aren't going to query against
>>>> it?  I
>>>> guess just for backup purposes.
>>>>
>>>> But even with just one solr, or querying master, if you commit at rate
>>>> such
>>>> that commits come before the warming queries can complete, you're going
>>>> to
>>>> have the same issue.
>>>>
>>>> The only answer I know of is "Don't commit (or replicate) at a faster
>>>> rate
>>>> than it takes your warming to complete."  You can reduce your warming
>>>> queries/operations, or reduce your commit/replicate frequency.
>>>>
>>>> Would be interesting/useful if Solr noticed this going on, and gave you
>>>> some
>>>> kind of error in the log (or even an exception when started with a
>>>> certain
>>>> parameter for testing) "Overlapping warming queries, you're committing
>>>> too
>>>> fast" or something. Because it's easy to make this happen without
>>>> realizing
>>>> it, and then your Solr does what Simon says, runs out of RAM and/or uses
>>>> a
>>>> whole lot of CPU and disk io.
>>>>
>>>> Lance Norskog wrote:
>>>>
>>>>>
>>>>> You should query against the indexer. I'm impressed that you got 5s
>>>>> replication to work reliably.
>>>>>
>>>>> On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org>
>>>>> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> We've been trying to get a setup in which a slave replicates from a
>>>>>> master every few seconds (ideally every second but currently we have
>>>>>> it
>>>>>> set at every 5s).
>>>>>>
>>>>>> Everything seems to work fine until, periodically, the slave just
>>>>>> stops
>>>>>> responding from what looks like it running out of memory:
>>>>>>
>>>>>> org.apache.catalina.core.StandardWrapperValve invoke
>>>>>> SEVERE: Servlet.service() for servlet jsp threw exception
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>
>>>>>>
>>>>>> (our monitoring seems to confirm this).
>>>>>>
>>>>>> Looking around my suspicion is that it takes new Readers longer to
>>>>>> warm
>>>>>> than the gap between replication and thus they just build up until all
>>>>>> memory is consumed (which, I suppose isn't really memory 'leaking' per
>>>>>> se, more just resource consumption)
>>>>>>
>>>>>> That said, we've tried turning off caching on the slave and that
>>>>>> didn't
>>>>>> help either so it's possible I'm wrong.
>>>>>>
>>>>>> Is there anything we can do about this? I'm reluctant to increase the
>>>>>> heap space since I suspect that will mean that there's just a longer
>>>>>> period between failures. Might Zoie help here? Or should we just query
>>>>>> against the Master?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Simon
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Possible memory leaks with frequent replication

Reply via email to