If I am understanding you correctly, you think it is caused by the DBQ deleting a document while a document with that same ID is being updated by another thread? I'm not sure that is what is happening here, as we only delete docs if they no longer exist in the DB, so nothing should be adding/updating a doc with that ID if it is marked for deletion, as we don't reuse IDs. I will double check though to confirm.
Also, not sure if relevant, but the DBQ itself returns very quickly, in a matter of ms, it's the updates that block for a huge amount of time. On Tue, Nov 7, 2017 at 11:08 AM, Amrit Sarkar <sarkaramr...@gmail.com> wrote: > Maybe not a relevant fact on this, but: "addAndDelete" is triggered by > "*Reordering > of DBQs'; *that means there are non-executed DBQs present in the updateLog > and an add operation is also received. Solr makes sure DBQs are executed > first and than add operation is executed. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > Medium: https://medium.com/@sarkaramrit2 > > On Tue, Nov 7, 2017 at 9:19 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > Well, consider what happens here. > > > > Solr gets a DBQ that includes document 132 and 10,000,000 other docs > > Solr gets an add for document 132 > > > > The DBQ takes time to execute. If it was processing the requests in > > parallel would 132 be in the index after the delete was over? It would > > depend on when the DBQ found the doc relative to the add. > > With this sequence one would expect 132 to be in the index at the end. > > > > And it's worse when it comes to distributed indexes. If the updates > > were sent out in parallel you could end up in situations where one > > replica contained 132 and another didn't depending on the vagaries of > > thread execution. > > > > Now I didn't write the DBQ code, but that's what I think is happening. > > > > Best, > > Erick > > > > On Tue, Nov 7, 2017 at 7:40 AM, Chris Troullis <cptroul...@gmail.com> > > wrote: > > > As an update, I have confirmed that it doesn't seem to have anything to > > do > > > with child documents, or standard deletes, just deleteByQuery. If I do > a > > > deleteByQuery on any collection while also adding/updating in separate > > > threads I am experiencing this blocking behavior on the non-leader > > replica. > > > > > > Has anyone else experienced this/have any thoughts on what to try? > > > > > > On Sun, Nov 5, 2017 at 2:20 PM, Chris Troullis <cptroul...@gmail.com> > > wrote: > > > > > >> Hi, > > >> > > >> I am experiencing an issue where threads are blocking for an extremely > > >> long time when I am indexing while deleteByQuery is also running. > > >> > > >> Setup info: > > >> -Solr Cloud 6.6.0 > > >> -Simple 2 Node, 1 Shard, 2 replica setup > > >> -~12 million docs in the collection in question > > >> -Nodes have 64 GB RAM, 8 CPUs, spinning disks > > >> -Soft commit interval 10 seconds, Hard commit (open searcher false) 60 > > >> seconds > > >> -Default merge policy settings (Which I think is 10/10). > > >> > > >> We have a query heavy index heavyish use case. Indexing is constantly > > >> running throughout the day and can be bursty. The indexing process > > handles > > >> both updates and deletes, can spin up to 15 simultaneous threads, and > > sends > > >> to solr in batches of 3000 (seems to be the optimal number per trial > and > > >> error). > > >> > > >> I can build the entire collection from scratch using this method in < > 40 > > >> mins and indexing is in general super fast (averages about 3 seconds > to > > >> send a batch of 3000 docs to solr). The issue I am seeing is when some > > >> threads are adding/updating documents while other threads are issuing > > >> deletes (using deleteByQuery), solr seems to get into a state of > extreme > > >> blocking on the replica, which results in some threads taking 30+ > > minutes > > >> just to send 1 batch of 3000 docs. This collection does use child > > documents > > >> (hence the delete by query _root_), not sure if that makes a > > difference, I > > >> am trying to duplicate on a non-child doc collection. CPU/IO wait > seems > > >> minimal on both nodes, so not sure what is causing the blocking. > > >> > > >> Here is part of the stack trace on one of the blocked threads on the > > >> replica: > > >> > > >> qtp592179046-576 (576) > > >> java.lang.Object@608fe9b5 > > >> org.apache.solr.update.DirectUpdateHandler2.addAndDelete( > > >> DirectUpdateHandler2.java:354) > > >> org.apache.solr.update.DirectUpdateHandler2.addDoc0( > > >> DirectUpdateHandler2.java:237) > > >> org.apache.solr.update.DirectUpdateHandler2.addDoc( > > >> DirectUpdateHandler2.java:194) > > >> org.apache.solr.update.processor.RunUpdateProcessor.processAdd( > > >> RunUpdateProcessorFactory.java:67) > > >> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd( > > >> UpdateRequestProcessor.java:55) > > >> org.apache.solr.update.processor.DistributedUpdateProcessor. > doLocalAdd( > > >> DistributedUpdateProcessor.java:979) > > >> org.apache.solr.update.processor.DistributedUpdateProcessor. > versionAdd( > > >> DistributedUpdateProcessor.java:1192) > > >> org.apache.solr.update.processor.DistributedUpdateProcessor. > processAdd( > > >> DistributedUpdateProcessor.java:748) > > >> org.apache.solr.handler.loader.JavabinLoader$1.update > > >> (JavabinLoader.java:98) > > >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1. > > >> readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180) > > >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1. > > >> readIterator(JavaBinUpdateRequestCodec.java:136) > > >> org.apache.solr.common.util.JavaBinCodec.readObject( > > >> JavaBinCodec.java:306) > > >> org.apache.solr.common.util.JavaBinCodec.readVal( > JavaBinCodec.java:251) > > >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1. > > >> readNamedList(JavaBinUpdateRequestCodec.java:122) > > >> org.apache.solr.common.util.JavaBinCodec.readObject( > > >> JavaBinCodec.java:271) > > >> org.apache.solr.common.util.JavaBinCodec.readVal( > JavaBinCodec.java:251) > > >> org.apache.solr.common.util.JavaBinCodec.unmarshal( > > JavaBinCodec.java:173) > > >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec. > > unmarshal( > > >> JavaBinUpdateRequestCodec.java:187) > > >> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs( > > >> JavabinLoader.java:108) > > >> org.apache.solr.handler.loader.JavabinLoader.load( > > JavabinLoader.java:55) > > >> org.apache.solr.handler.UpdateRequestHandler$1.load( > > >> UpdateRequestHandler.java:97) > > >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody( > > >> ContentStreamHandlerBase.java:68) > > >> org.apache.solr.handler.RequestHandlerBase.handleRequest( > > >> RequestHandlerBase.java:173) > > >> org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) > > >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) > > >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) > > >> > > >> A cursory search lead me to this JIRA https://issues.apache. > > >> org/jira/browse/SOLR-7836, not sure if related though. > > >> > > >> Can anyone shed some light on this issue? We don't do deletes very > > >> frequently, but it is bringing solr to it's knees when we do, which is > > >> causing some big problems. > > >> > > >> Thanks, > > >> > > >> Chris > > >> > > >