Hi,

We are using solr 6.1 version with 2 shards. Each shard have 1 replica set-up. 
i.e. We have total 4 server nodes (each node is assigned 60 gb of RAM).

Recently we are observing issue where solr node (any random node) automatically 
goes into recovery mode and stops responding.

We have enough memory allocated to Solr (60 gb) and system also have enough 
memory (300 gb)...

We have analyzed GC logs and found that there was GC pause time of 29.6583943 
second when problem happened. Can this GC Pause lead to make the node 
unavailable/recovery mode? or there could be some another reason ?

Please note we have set zkClientTimeout to 10 minutes (zkClientTimeout=600000) 
so that zookeeper will not consider this node unavailable during high GC pause 
time.

Solr GC Logs
==========

{Heap before GC invocations=10940 (full 14):
par new generation   total 17476288K, used 14724911K [0x0000000080000000, 
0x0000000580000000, 0x0000000580000000)
  eden space 13981056K, 100% used [0x0000000080000000, 0x00000003d5560000, 
0x00000003d5560000)
  from space 3495232K,  21% used [0x00000003d5560000, 0x0000000402bcbdb0, 
0x00000004aaab0000)
  to   space 3495232K,   0% used [0x00000004aaab0000, 0x00000004aaab0000, 
0x0000000580000000)
concurrent mark-sweep generation total 62914560K, used 27668932K 
[0x0000000580000000, 0x0000001480000000, 0x0000001480000000)
Metaspace       used 47602K, capacity 48370K, committed 49860K, reserved 51200K
2019-05-13T12:23:19.103+0100: 174643.550: [GC (Allocation Failure) 174643.550: 
[ParNew
Desired survivor size 3221205808 bytes, new threshold 8 (max 8)
- age   1:   52251504 bytes,   52251504 total
- age   2:  208183784 bytes,  260435288 total
- age   3:  274752960 bytes,  535188248 total
- age   4:   12176528 bytes,  547364776 total
- age   5:    6135968 bytes,  553500744 total
- age   6:    3903152 bytes,  557403896 total
- age   7:   15341896 bytes,  572745792 total
- age   8:    5518880 bytes,  578264672 total
: 14724911K->762845K(17476288K), 24.7822734 secs] 
42393844K->28434889K(80390848K), 24.7825687 secs] [Times: user=157.97 
sys=25.63, real=24.78 secs]
Heap after GC invocations=10941 (full 14):
par new generation   total 17476288K, used 762845K [0x0000000080000000, 
0x0000000580000000, 0x0000000580000000)
  eden space 13981056K,   0% used [0x0000000080000000, 0x0000000080000000, 
0x00000003d5560000)
  from space 3495232K,  21% used [0x00000004aaab0000, 0x00000004d93a76a8, 
0x0000000580000000)
  to   space 3495232K,   0% used [0x00000003d5560000, 0x00000003d5560000, 
0x00000004aaab0000)
concurrent mark-sweep generation total 62914560K, used 27672043K 
[0x0000000580000000, 0x0000001480000000, 0x0000001480000000)
Metaspace       used 47602K, capacity 48370K, committed 49860K, reserved 51200K
}
2019-05-13T12:23:44.456+0100: 174668.901: Total time for which application 
threads were stopped: 29.6583943 seconds, Stopping threads took: 4.3050775 
seconds


==============================



Regards,

Maulin

[CC Award Winners!]

Reply via email to