[
https://issues.apache.org/jira/browse/LUCENE-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308510#comment-17308510
]
Robert Muir commented on LUCENE-9867:
-------------------------------------
[~sqshq] the general problem is files getting deleted that should not be.
A number of things could cause this: stale directory metadata from filesystem,
two writers at the same time, etc.
Is XFS accessed via qemu disk image? Or via some other feature such as
virtio-fs?
Is there a chance of the two indexing patterns overlapping with each other at
the same time? Any special LockFactory configuration?
> CorruptIndexException after failed segment merge caused by No space left on
> device
> ----------------------------------------------------------------------------------
>
> Key: LUCENE-9867
> URL: https://issues.apache.org/jira/browse/LUCENE-9867
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/store
> Affects Versions: 8.5
> Reporter: Alexander L
> Priority: Major
>
> Failed segment merge caused by "No space left on device" can't be recovered
> and Lucene fails with CorruptIndexException after restart. The expectation is
> that Lucene will be able to restart automatically without manual intervention.
> We have 2 indexing patterns:
> * Create and commit an empty index, then start long initial indexing process
> (might take hours), perform a second commit in the end
> * Using existing index, add no more than 4k documents and commit after that
> Right now we don't have evidence to suggest which pattern caused this issue,
> but we definitely witnessed a similar situation for the second pattern,
> although it was a bit different - caused by {{OutOfMemoryError: Java Heap
> Space}}, with missing {{_q.cfe}} file which produced only
> {{NoSuchFileException}}, not {{CorruptIndexException}}. Please let me know if
> we need a separate ticket for that.
> Lucene version: 8.5.0
> Java version: OpenJDK 11
> OS: CentOS Linux 7
> Kernel: Linux 3.10.0-1160.11.1.el7.x86_64
> Virtualization: kvm
> Filesystem: xfs
> Failed merge stacktrace:
> {code:java}
> 2021-02-02T08:51:51.679+0000
> org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No
> space left on device
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
> Caused by: java.io.IOException: No space left on device
> at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> at
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
> at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
> at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
> at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
> at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
> at java.base/java.nio.channels.Channels.writeFully(Channels.java:97)
> at java.base/java.nio.channels.Channels$1.write(Channels.java:172)
> at
> org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416)
> at
> java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74)
> at
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
> at
> java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
> at
> org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
> at
> org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73)
> at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159)
> at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172)
> at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441)
> at
> org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165)
> at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229)
> at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159)
> at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636)
> at
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
> at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
> {code}
> Followed by failed startup:
> {code:java}
> 2021-02-02T08:52:07.926+0000
> org.apache.lucene.index.CorruptIndexException: Unexpected file read error
> while reading index.
> (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu")))
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)
> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
> Caused by: java.nio.file.NoSuchFileException:
> /data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si
> at
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at
> java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
> at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
> at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
> at
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
> at
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
> at
> org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
> ... 33 common frames omitted
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]