[ 
https://issues.apache.org/jira/browse/LUCENE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540526#comment-17540526
 ] 

Vigya Sharma commented on LUCENE-3373:
--------------------------------------

Sorry you ran into this Thomas. I tried beasting some concurrent IndexWriter 
tests but haven't been able to repro this yet.

{quote}i would suggest updating the MergeThread to catch all exceptions and 
allow processing the next merge. right now, any merge failure results in a 
ThreadDeath, which seems rather nasty. should probably just catch the exception 
and log a index trace message
{quote}
We already seem to have a {{catch (Throwable exc)}} that wraps all the logic in 
the thread {{run()}} method 
([code|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L720]).

{{waitForMerges()}} waits for both pendingMerges and runningMerges to get 
empty. The {{OneMerge}} object gets removed from {{pendingMerges}} before 
{{getNextMerge()}} returns. Similarly, it is removed from runningMerges when 
{{onMergeFinished()}} is called, which is done in the finally clause of the try 
wrapping run() code.

It's been a while since this root cause and fix in this issue were suggested. 
From code, I don't see why a background thread dying in 
ConcurrentMergeScheduler would lead to an endless stall in waitForMerges. 
Probably, the issue in run() has been subsequently addressed. 
But since you ran into the problem still, there is probably some underlying 
deadlock case still lurking in IndexWriter shutdown.

(It is possible that 
[LUCENE-10583|https://issues.apache.org/jira/browse/LUCENE-10583] is an 
independent issue. I'll mention it here to keep the two linked, since 
LUCENE-10583 mentions the problem you've seen recently).

> waitForMerges deadlocks if background merge fails
> -------------------------------------------------
>
>                 Key: LUCENE-3373
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3373
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.0.3
>            Reporter: Tim Smith
>            Priority: Major
>
> waitForMerges can deadlock if a merge fails for ConcurrentMergeScheduler
> this is because the merge thread will die, but pending merges are still 
> available
> normally, the merge thread will pick up the next merge once it finishes the 
> previous merge, but in the event of a merge exception, the pending work is 
> not resumed, but waitForMerges won't complete until all pending work is 
> complete
> i worked around this by overriding doMerge() like so:
> {code}
>   protected final void doMerge(MergePolicy.OneMerge merge) throws IOException 
> {
>     try {
>       super.doMerge(merge);
>     } catch (Throwable exc) {
>       // Just logging the exception and not rethrowing
>       // insert logging code here
>     }
>   }
> {code}
> Here's the rough steps i used to reproduce this issue:
> override doMerge like so
> {code}
>   protected final void doMerge(MergePolicy.OneMerge merge) throws IOException 
> {
>     try {Thread.sleep(500L);} catch (InterruptedException e) { }
>     super.doMerge(merge);
>     throw new IOException("fail");
>   }
> {code}
> then, if you do the following:
> loop 50 times:
>   addDocument // any doc
>   commit
> waitForMerges // This will deadlock sometimes
> SOLR-2017 may be related to this (stack trace for deadlock looked related)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to