My company has several Solrcloud environments. In our most active cloud we are 
seeing outages that are related to GC pauses. We have about 10 collections of 
which 4 get a lot of traffic. The solrcloud consists of 4 nodes with 6 
processors and 11Gb heap size (25Gb physical memory).

I notice that the 4 nodes seem to do their garbage collection at almost the 
same time. That seems strange to me. I would expect them to be more staggered.

This morning we had a GC pause that caused problems . During that time our 
application service was reporting "No live SolrServers available to handle this 
request"

Between 3:55 and 3:56 AM all 4 nodes were having some amount of garbage 
collection pauses, for 2 of the nodes it was minor, for one it was 50%. For 3 
nodes it lasted  until 3>57. However the node with the worst impact didn't 
recover until 4am.

How is it that all 4 nodes were in lock step doing GC? If they all are doing GC 
at the same time it defeats the purpose of having redundant cloud servers.
We just this weekend switched to use G1GC from CMS

At this point in time we also saw that traffic to solr was not well 
distributed. The application calls solr using CloudSolrClient which I thought 
did its own load balancing. We saw 10X more traffic going to one solr node that 
all the others, the we saw it start hitting another node. All solr queries come 
from our application.

During this period of time I saw only 1 error message in the solr log:
ERROR (zkConnectionManagerCallback-8-thread-1) [   ] o.a.s.c.ZkController There 
was a problem finding the leader in zk:org.apache.solr.common.SolrException: 
Could not get leader props

We are currently using Solr 7.7.2
GC Tuning
GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=250 \
-XX:+ParallelRefProcEnabled"




This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.



Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Reply via email to