Hi, I am experiencing an issue where threads are blocking for an extremely long time when I am indexing while deleteByQuery is also running.
Setup info: -Solr Cloud 6.6.0 -Simple 2 Node, 1 Shard, 2 replica setup -~12 million docs in the collection in question -Nodes have 64 GB RAM, 8 CPUs, spinning disks -Soft commit interval 10 seconds, Hard commit (open searcher false) 60 seconds -Default merge policy settings (Which I think is 10/10). We have a query heavy index heavyish use case. Indexing is constantly running throughout the day and can be bursty. The indexing process handles both updates and deletes, can spin up to 15 simultaneous threads, and sends to solr in batches of 3000 (seems to be the optimal number per trial and error). I can build the entire collection from scratch using this method in < 40 mins and indexing is in general super fast (averages about 3 seconds to send a batch of 3000 docs to solr). The issue I am seeing is when some threads are adding/updating documents while other threads are issuing deletes (using deleteByQuery), solr seems to get into a state of extreme blocking on the replica, which results in some threads taking 30+ minutes just to send 1 batch of 3000 docs. This collection does use child documents (hence the delete by query _root_), not sure if that makes a difference, I am trying to duplicate on a non-child doc collection. CPU/IO wait seems minimal on both nodes, so not sure what is causing the blocking. Here is part of the stack trace on one of the blocked threads on the replica: qtp592179046-576 (576) java.lang.Object@608fe9b5 org.apache.solr.update.DirectUpdateHandler2.addAndDelete(DirectUpdateHandler2.java:354) org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:237) org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194) org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979) org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192) org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748) org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136) org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306) org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:122) org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271) org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251) org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:187) org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:108) org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55) org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173) org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) A cursory search lead me to this JIRA https://issues.apache.org/jira/browse/SOLR-7836, not sure if related though. Can anyone shed some light on this issue? We don't do deletes very frequently, but it is bringing solr to it's knees when we do, which is causing some big problems. Thanks, Chris