[jira] [Commented] (LUCENE-3373) waitForMerges deadlocks if background merge fails

Vigya Sharma (Jira) Sat, 21 May 2022 22:21:07 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540531#comment-17540531
 ]


Vigya Sharma commented on LUCENE-3373:
--------------------------------------

I have a hypothesis...

When we IndexWriter.close(), we invoke 
[mergeScheduler.merge|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L2707],
 giving it one last change to process all pending merges. Within the conc. 
merge scheduler, there is rate limiting logic, that [stalls 
threads|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L539-L541]
 to adjust for merge I/O rate. It seems like we do not want to stall a 
{{mergeThread}}, and so, if a mergeThread calls {{maybeStall()}}, we 
[break|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L540]
 out of the while loop that is assigning threads to pending merges and draining 
the pendingMerges queue.

Hence, if the IW.shutdown call was to get picked up by a merge thread, it would 
break out of the loop before scheduling all pending merges. The outer shutdown 
call will expect all merges to have been scheduled (or get scheduled), and keep 
waiting for pendingMerges and runningMerges queues to get drained. Since the 
whole thing is also 
[gated|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L1315]
 to allow only one thread to do this, we miss our only chance to schedule 
threads, which leads to an endless wait.

All of this hinges on a MergeThread getting picked for mergeScheduler.merge. Or 
somehow, the thread that gets picked, exiting without scheduling all of them. 
Is that a possibility? I'm not sure how a merge thread could get picked for 
IW.close(). Any ideas for a concurrent test that would make this scenario more 
probable to hit?



 

> waitForMerges deadlocks if background merge fails
> -------------------------------------------------
>
>                 Key: LUCENE-3373
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3373
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.0.3
>            Reporter: Tim Smith
>            Priority: Major
>
> waitForMerges can deadlock if a merge fails for ConcurrentMergeScheduler
> this is because the merge thread will die, but pending merges are still 
> available
> normally, the merge thread will pick up the next merge once it finishes the 
> previous merge, but in the event of a merge exception, the pending work is 
> not resumed, but waitForMerges won't complete until all pending work is 
> complete
> i worked around this by overriding doMerge() like so:
> {code}
>   protected final void doMerge(MergePolicy.OneMerge merge) throws IOException 
> {
>     try {
>       super.doMerge(merge);
>     } catch (Throwable exc) {
>       // Just logging the exception and not rethrowing
>       // insert logging code here
>     }
>   }
> {code}
> Here's the rough steps i used to reproduce this issue:
> override doMerge like so
> {code}
>   protected final void doMerge(MergePolicy.OneMerge merge) throws IOException 
> {
>     try {Thread.sleep(500L);} catch (InterruptedException e) { }
>     super.doMerge(merge);
>     throw new IOException("fail");
>   }
> {code}
> then, if you do the following:
> loop 50 times:
>   addDocument // any doc
>   commit
> waitForMerges // This will deadlock sometimes
> SOLR-2017 may be related to this (stack trace for deadlock looked related)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3373) waitForMerges deadlocks if background merge fails

Reply via email to