[
https://issues.apache.org/jira/browse/LUCENE-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089085#comment-17089085
]
Simon Willnauer commented on LUCENE-9337:
-----------------------------------------
here is a PR https://github.com/apache/lucene-solr/pull/1443/
> CMS might miss to pickup pending merges when maxMergeCount changes while
> merges are running
> -------------------------------------------------------------------------------------------
>
> Key: LUCENE-9337
> URL: https://issues.apache.org/jira/browse/LUCENE-9337
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Simon Willnauer
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We found a test hanging on an IW#forceMerge on elastics CI on an innocent
> looking test:
> {noformat}
> 14:52:06 [junit4] 2> at
> [email protected]/java.lang.Object.wait(Native Method)
> 14:52:06 [junit4] 2> at
> app//org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4722)
> 14:52:06 [junit4] 2> at
> app//org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:2034)
> 14:52:06 [junit4] 2> at
> app//org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1960)
> 14:52:06 [junit4] 2> at
> app//org.apache.lucene.index.RandomIndexWriter.forceMerge(RandomIndexWriter.java:500)
> 14:52:06 [junit4] 2> at
> app//org.apache.lucene.index.BaseDocValuesFormatTestCase.doTestNumericsVsStoredFields(BaseDocValuesFormatTestCase.java:1301)
> 14:52:06 [junit4] 2> at
> app//org.apache.lucene.index.BaseDocValuesFormatTestCase.doTestNumericsVsStoredFields(BaseDocValuesFormatTestCase.java:1258)
> 14:52:06 [junit4] 2> at
> app//org.apache.lucene.index.BaseDocValuesFormatTestCase.testZeroOrMin(BaseDocValuesFormatTestCase.java:2423)
> {noformat}
> after spending quite some time trying to reproduce without any luck I tried
> to review all involved code again to understand possible threading issues.
> What I found is that if maxMergeCount gets changed on CMS while there are
> merges running and the forceMerge gets kicked off at the same time the
> running merges return we might miss to pick up the final pending merges which
> causes the forceMerge to hang. I was able to build a test-case that is very
> likely to fail on every run without the fix. While I think this is not a
> critical bug from how likely it is to happen in practice, if it happens it's
> basically a deadlock unless the IW sees any other change that kicks off a
> merge.
> Lemme walk through the issue. Lets say we have 1 pending merge and 2 merge
> threads running on CMS. The forceMerge is already waiting for merges to
> finish. Once the first merge thread finishes we try to check if we need to
> stall it
> [here|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.5.1/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L580]
> but since it's a merge thread we return
> [here|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.5.1/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L596]
> and don't pick up another merge
> [here|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.5.1/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L526].
>
> Now the second running merge thread checks the condition
> [here|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.5.1/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L580]
> while the first one is finishing up. But before it can actually update the
> internal datastructures
> [here|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.5.1/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java#L688]
> it releases the CMS lock and the calculation in the stall method on how many
> threads are running is off causing the second thread also to step out of the
> maybeStall method not picking up the pending merge.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]