Hi Rick,
I quickly looked at GC logs and didn’t see obvious issues. You mentioned that 
batch processing takes ~20s and it is 500 documents. With 5-7 indexing thread 
it is ~150 documents/s. Are those big documents?
With 200 queries/min (~3-4 queries/s - what sort of queries?) and 5-7 indexing 
threads, you might be overloading 4 cores.
Do you have dedicated ZK nodes? Do you see the same issues with less indexing 
threads?

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Nov 2017, at 14:25, Rick Dig <teram...@gmail.com> wrote:
> 
> not committing after the batch. made sure we have that turned off.
> maxTime is set to 300000 (300 seconds), openSearcher is set to true.
> 
> 
> On Sat, Nov 4, 2017 at 6:50 PM, Amrit Sarkar <sarkaramr...@gmail.com> wrote:
> 
>> Pretty much what Emir has stated. I want to know, when you saw;
>> 
>> all of this runs perfectly ok when indexing isn't happening. as soon as
>>> we start "nrt" indexing one of the follower nodes goes down within 10 to
>> 20
>>> minutes.
>> 
>> 
>> When you say "NRT" indexing, what is the commit strategy in indexing. With
>> auto-commit so highly set, are you committing after batch, if yes, what's
>> the number.
>> 
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> 
>> On Sat, Nov 4, 2017 at 2:47 PM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>> 
>>> Hi Rick,
>>> Do you see any errors in logs? Do you have any monitoring tool? Maybe you
>>> can check heap and GC metrics around time when incident happened. It is
>> not
>>> large heap but some major GC could cause pause large enough to trigger
>> some
>>> snowball and end up with node in recovery state.
>>> What is indexing rate you observe? Why do you have max warming searchers
>> 5
>>> (did you mean this with autowarmingsearchers?) when you commit every 5
>> min?
>>> Why did you increase it - you seen errors with default 2? Maybe you
>> commit
>>> every bulk?
>>> Do you see similar behaviour when you just do indexing without queries?
>>> 
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 4 Nov 2017, at 05:15, Rick Dig <teram...@gmail.com> wrote:
>>>> 
>>>> hello all,
>>>> we are trying to run solrcloud 6.6 in a production setting.
>>>> here's our config and issue
>>>> 1) 3 nodes, 1 shard, replication factor 3
>>>> 2) all nodes are 16GB RAM, 4 core
>>>> 3) Our production load is about 2000 requests per minute
>>>> 4) index is fairly small, index size is around 400 MB with 300k
>> documents
>>>> 5) autocommit is currently set to 5 minutes (even though ideally we
>> would
>>>> like a smaller interval).
>>>> 6) the jvm runs with 8 gb Xms and Xmx with CMS gc.
>>>> 7) all of this runs perfectly ok when indexing isn't happening. as soon
>>> as
>>>> we start "nrt" indexing one of the follower nodes goes down within 10
>> to
>>> 20
>>>> minutes. from this point on the nodes never recover unless we stop
>>>> indexing.  the master usually is the last one to fall.
>>>> 8) there are maybe 5 to 7 processes indexing at the same time with
>>> document
>>>> batch sizes of 500.
>>>> 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5,
>>>> 10) no cpu and / or oom issues that we can see.
>>>> 11) cpu load does go fairly high 15 to 20 at times.
>>>> any help or pointers appreciated
>>>> 
>>>> thanks
>>>> rick
>>> 
>>> 
>> 

Reply via email to