[
https://issues.apache.org/jira/browse/LUCENE-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon Willnauer resolved LUCENE-9477.
-------------------------------------
Fix Version/s: 8.7
master (9.0)
Resolution: Fixed
> IndexWriter might leave broken segments file behind on exception during
> rollback
> --------------------------------------------------------------------------------
>
> Key: LUCENE-9477
> URL: https://issues.apache.org/jira/browse/LUCENE-9477
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Simon Willnauer
> Priority: Major
> Fix For: master (9.0), 8.7
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Mike ran some beasty tests while I was working on LUCENE-8962. This test
> caused some headaches since it only rarely also fails on master:
> {noformat}
> org.apache.lucene.index.TestIndexWriterOnVMError > testUnknownError FAILED
> org.apache.lucene.index.CorruptIndexException: Unexpected file read error
> while reading index.
> (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((clone of)
> ByteBuffersIndexInput (file=pending_segments_2, buffers\
> =258 bytes, block size: 1, blocks: 1, position: 0))))
> at
> __randomizedtesting.SeedInfo.seed([587A104EFE0C57E1:B32CCFCEFC8BC1D1]:0)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:300)
> at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:521)
> at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:301)
> at
> org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:836)
> at
> org.apache.lucene.index.TestIndexWriterOnVMError.doTest(TestIndexWriterOnVMError.java:89)
> at
> org.apache.lucene.index.TestIndexWriterOnVMError.testUnknownError(TestIndexWriterOnVMError.java:251)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by:
> java.io.FileNotFoundException: _0.si in
> dir=ByteBuffersDirectory@1bae3fe1
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@38275f41
> at
> org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:748)
> at
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
> at
> org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1044)
> at
> org.apache.lucene.codecs.lucene86.Lucene86SegmentInfoFormat.read(Lucene86SegmentInfoFormat.java:91)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:364)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:298)
> ... 41 more
> ....
> 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterOnVMError
> -Dtests.method=testUnknownError -Dtests.seed=587A104EFE0C57E1
> -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true
> -Dtests.linedocsfile=/l/sim\
> on/lucene/test-framework/src/resources/org/apache/lucene/util/2000mb.txt.gz
> -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true
> -Dtests.file.encoding=UTF-8
> 2> NOTE: leaving temporary files on disk at:
> /l/simon/lucene/core/build/tmp/tests-tmp/lucene.index.TestIndexWriterOnVMError_587A104EFE0C57E1-003
> 2> NOTE: test params are: codec=Asserting(Lucene86):
> {text_payloads=BlockTreeOrds(blocksize=128),
> text_vectors=PostingsFormat(name=Asserting),
> text1=PostingsFormat(name=Asserting), id=BlockTreeOrds(blocksize=128)},
> docValu\
> es:{dv3=DocValuesFormat(name=Lucene80), dv2=DocValuesFormat(name=Asserting),
> dv5=DocValuesFormat(name=Lucene80), dv=DocValuesFormat(name=Asserting),
> dv4=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=696, maxMBSortInH\
> eap=6.040673619645681, sim=Asserting(RandomSimilarity(queryNorm=false):
> {text_payloads=IB SPL-DZ(0.3), text_vectors=DFR I(ne)L3(800.0),
> text1=org.apache.lucene.search.similarities.BooleanSimilarity@6f4329a1}),
> locale=zh-CN, \
> timezone=SystemV/MST7MDT
> 2> NOTE: Linux 5.5.6-arch1-1 amd64/Oracle Corporation 11.0.6
> (64-bit)/cpus=128,threads=1,free=241525696,total=268435456
> 2> NOTE: All tests run in this JVM: [TestIndexWriterOnVMError]
> {noformat}
> The test reproduces on master also without the huge line docs file using this:
> {noformat}
> ant test -Dtestcase=TestIndexWriterOnVMError -Dtests.method=testUnknownError
> -Dtests.seed=587A104EFE0C57E1 -Dtests.nightly=true -Dtests.slow=true
> -Dtests.badapples=true -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT
> -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> {noformat}
> the reason is that we fail to delete the already renamed pending segments
> file when the metadata sync on the directory fails. The subsequent rollback
> also crashes while it's trying to delete unrefed files and that will cause
> subsequent CheckIndex calls to fail with FNF exceptions since the commit was
> written but not fully removed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]