Hi Shawn, Erik

> updates should slow down but not deadlock. 
The net effect is the same. As the CLOSE_WAITs increase, jvm ultimately
stops accepting new socket requests, at which point `kill <solrpid>` is the
only option. 

This means if replication handler is invoked which sets the deletion policy,
the threads blocked rises even faster and system fails even faster.

Each solr POST is a blocking call, hence the CLOSE_WAITs. Also the POST gzip
is an json array of 100 json objects (1 json doc = 1 solr doc).

All custom AbstractSolrEventListener listeners were disabled to not process
any post commit events. Those threads are in WAITING state, which is ok.

I then ran /solr/58f449cec94a2c75-core-256/admin/luke at 10:30pm PST

It showed "lastModified: 2018-02-11T04:46:54.540Z" indicating commit blocked
for about 2 hours.
Hard commit is set as 10secs in solrconfig.xml

Other cores are also blocked for a while.

Thread dump and top output are from that condition are at
https://github.com/mohsinbeg/datadump/tree/master/solr58f449cec94a2c75_core_256

netstat CLOSE_WAIT are correlated with DirectUpdateHandler2 /
UpdateRequestProcessor.processAdd() requests.

solr [ /tmp ]$ sudo netstat -ptan | awk '{print $6 " " $7 }' | sort | uniq
-c; TZ=PST8PDT date;
   7728 CLOSE_WAIT -
     65 ESTABLISHED -
      1 FIN_WAIT2 -
      1 Foreign Address
      6 LISTEN -
     36 TIME_WAIT -
      1 established)
Sat Feb 10 22:27:07 PST 2018

http://fastthread.io shows lots 6,700 threads in TIMED_WAIT

https://jstack.review shows 6584 threads with this stack
at java.lang.Object.wait(Native Method)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doStall(ConcurrentMergeScheduler.java:616)
at
org.apache.lucene.index.ConcurrentMergeScheduler.maybeStall(ConcurrentMergeScheduler.java:602)
...
at
org.apache.solr.update.DirectUpdateHandler2.allowDuplicateUpdate(DirectUpdateHandler2.java:276)
...
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
...


Only `top` available on Photon OS is
https://github.com/vmware/photon/blob/1.0/SPECS/procps-ng/procps-ng.spec.
Those screenshots are attached.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to