I've noticed something weird since implementing the change Shawn suggested, I wonder if someone can shed some light on it:
Since changing from delete by query _root_:.. to querying for ids _root_: and then deleteById(ids from root query), we have started to notice some facet counts for child document facets not matching the actual query results. For example, facet shows a count of 10, click on the facet which applies a FQ with block join to return parent docs, and the number of results is less than the facet count, when they should match (facet count is doing a unique(_root_) so is only counting parents). I suspect that this may be somehow caused by orphaned child documents since the delete process changed. Does anyone know if changing from a DBQ: _root_ to the aforementioned querying for ids _root_ and delete by id would cause any issues with deleting child documents? Just trying manually it seems to work fine, but something is going on in some of our test environments. Thanks, Chris On Thu, Nov 9, 2017 at 2:52 PM, Chris Troullis <cptroul...@gmail.com> wrote: > Thanks Mike, I will experiment with that and see if it does anything for > this particular issue. > > I implemented Shawn's workaround and the problem has gone away, so that is > good at least for the time being. > > Do we think that this is something that should be tracked in JIRA for 6.X? > Or should I confirm if it is still happening in 7.X before logging anything? > > On Wed, Nov 8, 2017 at 6:23 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> I'm not sure this is what's affecting you, but you might try upgrading to >> Lucene/Solr 7.1; in 7.0 there were big improvements in using multiple >> threads to resolve deletions: >> http://blog.mikemccandless.com/2017/07/lucene-gets-concurren >> t-deletes-and.html >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Tue, Nov 7, 2017 at 2:26 PM, Chris Troullis <cptroul...@gmail.com> >> wrote: >> >> > @Erick, I see, thanks for the clarification. >> > >> > @Shawn, Good idea for the workaround! I will try that and see if it >> > resolves the issue. >> > >> > Thanks, >> > >> > Chris >> > >> > On Tue, Nov 7, 2017 at 1:09 PM, Erick Erickson <erickerick...@gmail.com >> > >> > wrote: >> > >> > > bq: you think it is caused by the DBQ deleting a document while a >> > > document with that same ID >> > > >> > > No. I'm saying that DBQ has no idea _if_ that would be the case so >> > > can't carry out the operations in parallel because it _might_ be the >> > > case. >> > > >> > > Shawn: >> > > >> > > IIUC, here's the problem. For deleteById, I can guarantee the >> > > sequencing through the same optimistic locking that regular updates >> > > use (i.e. the _version_ field). But I'm kind of guessing here. >> > > >> > > Best, >> > > Erick >> > > >> > > On Tue, Nov 7, 2017 at 8:51 AM, Shawn Heisey <apa...@elyograg.org> >> > wrote: >> > > > On 11/5/2017 12:20 PM, Chris Troullis wrote: >> > > >> The issue I am seeing is when some >> > > >> threads are adding/updating documents while other threads are >> issuing >> > > >> deletes (using deleteByQuery), solr seems to get into a state of >> > extreme >> > > >> blocking on the replica >> > > > >> > > > The deleteByQuery operation cannot coexist very well with other >> > indexing >> > > > operations. Let me tell you about something I discovered. I think >> > your >> > > > problem is very similar. >> > > > >> > > > Solr 4.0 and later is supposed to be able to handle indexing >> operations >> > > > at the same time that the index is being optimized (in Lucene, >> > > > forceMerge). I have some indexes that take about two hours to >> > optimize, >> > > > so having indexing stop while that happens is a less than ideal >> > > > situation. Ongoing indexing is similar in many ways to a merge, >> enough >> > > > that it is handled by the same Merge Scheduler that handles an >> > optimize. >> > > > >> > > > I could indeed add documents to the index without issues at the same >> > > > time as an optimize, but when I would try my full indexing cycle >> while >> > > > an optimize was underway, I found that all operations stopped until >> the >> > > > optimize finished. >> > > > >> > > > Ultimately what was determined (I think it was Yonik that figured it >> > > > out) was that *most* indexing operations can happen during the >> > optimize, >> > > > *except* for deleteByQuery. The deleteById operation works just >> fine. >> > > > >> > > > I do not understand the low-level reasons for this, but apparently >> it's >> > > > not something that can be easily fixed. >> > > > >> > > > A workaround is to send the query you plan to use with >> deleteByQuery as >> > > > a standard query with a limited fl parameter, to retrieve matching >> > > > uniqueKey values from the index, then do a deleteById with that >> list of >> > > > ID values instead. >> > > > >> > > > Thanks, >> > > > Shawn >> > > > >> > > >> > >> > >