Hi,
We are experiencing some intermittent slowness on updates for one of our collections. We see user operations hanging on updates to SOLR via SolrJ client. Every time in the period of the slowness we see something like this in the log of the replica: [org.apache.solr.update.UpdateHandler] Reordered DBQs detected. Update=add{_version_=1504391336428568576,id= 2392581250002321} DBQs=[DBQ{version=1504391337298886656,q=level_2_id:12345}] After a while The DBQ is piling up and we see the list of DBQ growing. At some point the time of updates is increase from 300 ms to 20 seconds and then on the leader log I see read timeout exception and it initiates recovery on the replica. At that point all updates start to be very slow – from 20 seconds to 60 seconds. Especially updates with deletByQuery. We are not sure if the DBQ is the cause or symptom. But, what does not make sense to me is that the slowness is only on the replica side. We suspect that the fact that the updates become slow on the replica cause a timeout on the leader side and cause the recovery. Would really appreciate any help on this. Thanks, Some info: DBQ are sent as a separate update request from the add requests. We currently use SolrCloud 4.9.0. We have ~140 collections on 4 nodes – 1,2,3,4. Each collection has a single shard with a leader and another replica. ~70 collections are on node 1 and 2 as leader and replica and the other collections are on 3 and 4. On each node there’s about 65GB of index with 25,000,000 documents. This is our update handler, autoSoftCommit is set to 2 seconds, but there may be manual soft commits coming from user operations from time to time: <updateHandler class="solr.DirectUpdateHandler2"> <autoCommit> <maxDocs>10000</maxDocs> <maxTime>120000</maxTime> <openSearcher>true</openSearcher> </autoCommit> <autoSoftCommit> <maxDocs>1000</maxDocs> <maxTime>2000</maxTime> </autoSoftCommit> <updateLog /> </updateHandler>