Re: Solr nodes going into recovery mode and eventually failing

Emir Arnautović Tue, 24 Oct 2017 04:02:11 -0700

Hi Shamik,
Please see incline comments/questions.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/




> On 24 Oct 2017, at 07:41, shamik <sham...@gmail.com> wrote:
> 
> Thanks Emir and Zisis.
> 
> I added the maxRamMB for filterCache and reduced the size. I could the
> benefit immediately, the hit ratio went to 0.97. Here's the configuration:
> 
> <filterCache class="solr.FastLRUCache" size="512" initialSize="512"
> autowarmCount="128" maxRamMB="500" />
> <queryResultCache class="solr.LRUCache" size="512" initialSize="512"
> autowarmCount="128" />
> <documentCache class="solr.LRUCache" size="512" initialSize="512"
> autowarmCount=“0" />
[EA] Based on what you mentioned earlier, not all your filters are “cache 
friendly” and hit rate depends on your clients usage (or maybe how many 
concurrent clients you have) - in other words this hit ratio could be false 
positive. The one explanation is that you previously had count limitation and 
now memory and that memory ignores count (would need to check this) and that 
your cache items are on average smaller then estimated and more than 4K can fit 
into 500MB, but based on your query rate and commit interval, not sure if you 
needed more than 4K.

> 
> It seemed to be stable for few days, the cache hits and jvm pool utilization
> seemed to be well within expected range. But the OOM issue occurred on one
> of the nodes as the heap size reached 30gb. The hit ratio for query result
> cache and document cache at that point was recorded as 0.18 and 0.65. I'm
> not sure if the cache caused the memory spike at this point, with filter
> cache restricted to 500mb, it should be negligible. One thing I noticed is
> that the eviction rate now (with the addition of maxRamMB) is staying at 0.
[EA] Did you see evictions before? With 400rq/h and 10min commit intervals, you 
get 60-70 rq between two commits. With 4K cache size, each request should 
consume >60 cache entries in order to start evicting.

> Index hard commit happens at every 10 min, that's when the cache gets
> flushed. Based on the monitoring log, the spike happened on the indexing
> side where almost 8k docs went to pending state.
> 
> On the query performance standpoint, there have been occasional slow queries
> (1sec+), but nothing alarming so far. Same goes for deep paging, I haven't
> seen any evidence which points to that.
> 
> Based on the hit ratio, I can further scale down the query result and
> document cache, also change to FastLRUCache and add maxRamMB. For filter
> cache, I think this setting should be optimal enough to work on a 30gb heap
> space unless I'm wrong on the maxRamMB concept. I'll have to get a heap dump
> somehow, unfortunately, the whole process (of the node going down) happens
> so quickly, I’ve hardly any time to run a profiler.
[EA] You did not mention if you ruled out fieldCache and fieldValueCache?
I don’t have much experience with LTR, but see that there is another cache 
related to that. Do you use it? Could it be the component that consumes memory?

> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr nodes going into recovery mode and eventually failing

Reply via email to