aoli-al opened a new issue, #13552:
URL: https://github.com/apache/lucene/issues/13552

   ### Description
   
   I saw a flaky test, 
`TestIndexWriterWithThreads#testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce`
 caused by concurrency issues recently: 
   
   ```
   MockDirectoryWrapper: cannot close: there are still 30 open files: 
{_1.tvd=1, _0_Lucene99_0.pos=1, _1.nvd=1, _0.nvd=1, _2.fdt=1, _2.tvx=1, 
_0_Lucene99_0.tim=1, _0_Lucene99_0.tip=1, _2.fdx=1, _2_Lucene99_0.doc=1, 
_2.nvd=1, _2.tvd=1, _0_Lucene90_0.dvd=1, _1.fdx=1, _1_Lucene99_0.doc=1, 
_1.fdt=1, _1.tvx=1, _2_Lucene99_0.pos=1, _1_Lucene90_0.dvd=1, 
_0_Lucene99_0.doc=1, _0.fdx=1, _2_Lucene99_0.tip=1, _0.fdt=1, _0.tvx=1, 
_2_Lucene99_0.tim=1, _0.tvd=1, _1_Lucene99_0.pos=1, _2_Lucene90_0.dvd=1, 
_1_Lucene99_0.tip=1, _1_Lucene99_0.tim=1}
   java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are 
still 30 open files: {_1.tvd=1, _0_Lucene99_0.pos=1, _1.nvd=1, _0.nvd=1, 
_2.fdt=1, _2.tvx=1, _0_Lucene99_0.tim=1, _0_Lucene99_0.tip=1, _2.fdx=1, 
_2_Lucene99_0.doc=1, _2.nvd=1, _2.tvd=1, _0_Lucene90_0.dvd=1, _1.fdx=1, 
_1_Lucene99_0.doc=1, _1.fdt=1, _1.tvx=1, _2_Lucene99_0.pos=1, 
_1_Lucene90_0.dvd=1, _0_Lucene99_0.doc=1, _0.fdx=1, _2_Lucene99_0.tip=1, 
_0.fdt=1, _0.tvx=1, _2_Lucene99_0.tim=1, _0.tvd=1, _1_Lucene99_0.pos=1, 
_2_Lucene90_0.dvd=1, _1_Lucene99_0.tip=1, _1_Lucene99_0.tim=1}
        at 
__randomizedtesting.SeedInfo.seed([4D88B1144F0B21A3:1E330D289AB0246]:0)
        at 
org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:876)
        at 
org.apache.lucene.index.TestIndexWriterWithThreads._testMultipleThreadsFailure(TestIndexWriterWithThreads.java:349)
        at 
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce(TestIndexWriterWithThreads.java:502)
        at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at 
org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at 
org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at 
org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base/java.lang.Thread.run(Thread.java:1583)
   Caused by: java.lang.RuntimeException: unclosed IndexInput: _2_Lucene99_0.tim
        at 
org.apache.lucene.tests.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:783)
        at 
org.apache.lucene.tests.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:835)
        at 
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.<init>(Lucene90BlockTreeTermsReader.java:136)
        at 
org.apache.lucene.codecs.lucene99.Lucene99PostingsFormat.fieldsProducer(Lucene99PostingsFormat.java:428)
        at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.<init>(PerFieldPostingsFormat.java:328)
        at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:390)
        at 
org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:98)
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:95)
        at 
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179)
        at 
org.apache.lucene.index.BufferedUpdatesStream$SegmentState.<init>(BufferedUpdatesStream.java:286)
        at 
org.apache.lucene.index.IndexWriter.openSegmentStates(IndexWriter.java:6481)
        at org.apache.lucene.index.IndexWriter.forceApply(IndexWriter.java:6255)
        at org.apache.lucene.index.IndexWriter.tryApply(IndexWriter.java:6189)
        at 
org.apache.lucene.index.IndexWriter.lambda$publishFrozenUpdates$12(IndexWriter.java:2814)
        at 
org.apache.lucene.index.IndexWriter$EventQueue.processEventsInternal(IndexWriter.java:336)
        at 
org.apache.lucene.index.IndexWriter$EventQueue.processEvents(IndexWriter.java:325)
        at 
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:6032)
        at 
org.apache.lucene.index.IndexWriter.maybeProcessEvents(IndexWriter.java:6025)
        at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1552)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1843)
        at 
org.apache.lucene.index.TestIndexWriterWithThreads$IndexerThread.run(TestIndexWriterWithThreads.java:98)
   
   
   unclosed IndexInput: _2_Lucene99_0.tim
   java.lang.RuntimeException: unclosed IndexInput: _2_Lucene99_0.tim
        at 
org.apache.lucene.tests.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:783)
        at 
org.apache.lucene.tests.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:835)
        at 
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.<init>(Lucene90BlockTreeTermsReader.java:136)
        at 
org.apache.lucene.codecs.lucene99.Lucene99PostingsFormat.fieldsProducer(Lucene99PostingsFormat.java:428)
        at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.<init>(PerFieldPostingsFormat.java:328)
        at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:390)
        at 
org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:98)
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:95)
        at 
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179)
        at 
org.apache.lucene.index.BufferedUpdatesStream$SegmentState.<init>(BufferedUpdatesStream.java:286)
        at 
org.apache.lucene.index.IndexWriter.openSegmentStates(IndexWriter.java:6481)
        at org.apache.lucene.index.IndexWriter.forceApply(IndexWriter.java:6255)
        at org.apache.lucene.index.IndexWriter.tryApply(IndexWriter.java:6189)
        at 
org.apache.lucene.index.IndexWriter.lambda$publishFrozenUpdates$12(IndexWriter.java:2814)
        at 
org.apache.lucene.index.IndexWriter$EventQueue.processEventsInternal(IndexWriter.java:336)
        at 
org.apache.lucene.index.IndexWriter$EventQueue.processEvents(IndexWriter.java:325)
        at 
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:6032)
        at 
org.apache.lucene.index.IndexWriter.maybeProcessEvents(IndexWriter.java:6025)
        at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1552)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1843)
        at 
org.apache.lucene.index.TestIndexWriterWithThreads$IndexerThread.run(TestIndexWriterWithThreads.java:98)
   
   NOTE: reproduce with: gradlew test --tests 
TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce 
-Dtests.seed=4D88B1144F0B21A3 -Dtests.locale=gv-Latn-IM 
-Dtests.timezone=America/Noronha -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   NOTE: test params are: 
codec=LZ4WithPresetCompressingStoredFieldsData(storedFieldsFormat=Lucene90CompressingStoredFieldsFormat(compressionMode=BEST_SPEED,
 chunkSize=24, maxDocsPerChunk=9, blockShift=7), 
termVectorsFormat=Lucene90CompressingTermVectorsFormat(compressionMode=BEST_SPEED,
 chunkSize=24, maxDocsPerChunk=9, blockSize=7)), 
sim=Asserting(RandomSimilarity(queryNorm=true): {field=IB LL-D2}), 
locale=gv-Latn-IM, timezone=America/Noronha
   NOTE: Mac OS X 14.5 aarch64/Oracle Corporation 21.0.2 
(64-bit)/cpus=1,threads=1,free=229604664,total=257425408
   NOTE: All tests run in this JVM: [TestIndexWriterWithThreads]
   ```
   
   I could not identify the root cause of the failure, but luckily, I found a 
way to make the failure more deterministic. The test is still flaky, and you 
may run it multiple times to see the failure.
   
   ```
   diff --git a/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java 
b/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
   index 44d8bee8460..ace128a6cf8 100644
   --- a/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
   +++ b/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
   @@ -1560,6 +1560,9 @@ public class IndexWriter
            if (infoStream.isEnabled("IW")) {
              infoStream.message("IW", "hit exception updating document");
            }
   +        try {
   +          Thread.sleep(1000);
   +        } catch (Exception e) {}
            maybeCloseOnTragicEvent();
          }
        }
   @@ -2498,6 +2501,9 @@ public class IndexWriter
    
          docWriter.close(); // mark it as closed first to prevent subsequent 
indexing actions/flushes
          assert !Thread.holdsLock(this) : "IndexWriter lock should never be 
hold when aborting";
   +      try {
   +        Thread.sleep(5000);
   +      } catch (Exception e) {}
          docWriter.abort(); // don't sync on IW here
          docWriter.flushControl.waitForFlush(); // wait for all concurrently 
running flushes
          publishFlushedSegments(
   @@ -4944,6 +4950,10 @@ public class IndexWriter
        assert merge.registerDone;
        assert merge.maxNumSegments == UNBOUNDED_MAX_MERGE_SEGMENTS || 
merge.maxNumSegments > 0;
    
   +    try {
   +      Thread.sleep(1000);
   +    } catch (Exception e) {
   +    }
        if (tragedy.get() != null) {
          throw new IllegalStateException(
              "this writer hit an unrecoverable error; cannot merge", 
tragedy.get());
   diff --git 
a/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterWithThreads.java 
b/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterWithThreads.java
   index 65dc37b50e5..b90d93a117d 100644
   --- 
a/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterWithThreads.java
   +++ 
b/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterWithThreads.java
   @@ -270,9 +270,9 @@ public class TestIndexWriterWithThreads extends 
LuceneTestCase {
      // failure to trigger an IOException
      public void _testMultipleThreadsFailure(MockDirectoryWrapper.Failure 
failure) throws Exception {
    
   -    int NUM_THREADS = 3;
   +    int NUM_THREADS = 1;
    
   -    for (int iter = 0; iter < 2; iter++) {
   +    for (int iter = 0; iter < 1; iter++) {
          if (VERBOSE) {
            System.out.println("TEST: iter=" + iter);
          }
   @@ -296,6 +296,9 @@ public class TestIndexWriterWithThreads extends 
LuceneTestCase {
          }
          syncStart.await();
    
   +      try {
   +        Thread.sleep(500);
   +      } catch (Exception e) {}
          dir.failOn(failure);
          failure.setDoFail();
   ```
   
   
   Some observations:
   - the failure only happens if Merge threads are created, which requires 
`dir.failOn(failure);` to be executed after IndexWriter thread calls 
`updateDocuments`. 
   - The merge threads should call the `IndexWriter.shouldClose` method. This 
requires the writer thread to wait a bit before calling 
`maybeCloseOnTragicEvent` at 
https://github.com/apache/lucene/blob/3304b60c9cc2fd23531263ea9d95dc3e92c0b2ce/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L1563
   - This allows the merge thread to call `maybeCloseOnTragicEvent();` while 
the IW threads continue.
   -  Finally, the testing thread closes the `MockDirectoryWrapper` while the 
Merge thread is still inside `maybeCloseOnTragicEvent`.
   
   
   This may be related to #10112, #10930, and #4890
   
   ### Version and environment details
   
   My lucene commit: 33a4c1d8ef999902dacedde9c7f04a3c7e2e78c9
   JDK version: openjdk 21.0.3 2024-04-16
   OpenJDK Runtime Environment Homebrew (build 21.0.3)
   OpenJDK 64-Bit Server VM Homebrew (build 21.0.3, mixed mode, sharing)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to