[ 
https://issues.apache.org/jira/browse/LUCENE-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander L updated LUCENE-9867:
--------------------------------
    Description: 
Failed segment merge caused by "No space left on device" can't be recovered and 
Lucene fails with CorruptIndexException after restart. The expectation is that 
Lucene will be able to restart automatically without manual intervention.

We have 2 indexing patterns:
 * Create and commit an empty index, then start long initial indexing process 
(might take hours), perform a second commit in the end
 * Using existing index, add no more than 4k documents and commit after that

Right now we don't have evidence to suggest which pattern caused this issue, 
but we definitely witnessed a similar situation for the second pattern, 
although it was a bit different - caused by {{OutOfMemoryError: Java Heap 
Space}}, with missing {{_q.cfe}} file which produced only 
{{FileNotFoundException}}, not {{CorruptIndexException}}. Please let me know if 
we need a separate ticket for that.

Lucene version: 8.5.0
 Java version: OpenJDK 11

OS: CentOS Linux 7
 Kernel: Linux 3.10.0-1160.11.1.el7.x86_64
 Virtualization: kvm
 Filesystem: xfs

Failed merge stacktrace:
{code:java}
2021-02-02T08:51:51.679+0000
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No 
space left on device
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
Caused by: java.io.IOException: No space left on device
        at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at 
java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
        at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
        at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
        at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
        at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
        at java.base/java.nio.channels.Channels.writeFully(Channels.java:97)
        at java.base/java.nio.channels.Channels$1.write(Channels.java:172)
        at 
org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416)
        at 
java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74)
        at 
java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
        at 
java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
        at 
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
        at 
org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73)
        at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159)
        at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172)
        at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441)
        at 
org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636)
        at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
        at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
{code}
 Followed by failed startup:
{code:java}
2021-02-02T08:52:07.926+0000
org.apache.lucene.index.CorruptIndexException: Unexpected file read error while 
reading index. 
(resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu")))
        at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
Caused by: java.nio.file.NoSuchFileException: 
/data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si
        at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
        at 
java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
        at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
        at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
        at 
org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91)
        at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353)
        at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
        ... 33 common frames omitted
{code}

  was:
Failed segment merge caused by "No space left on device" can't be recovered and 
Lucene fails with CorruptIndexException after restart. The expectation is that 
Lucene will be able to restart automatically without manual intervention.

We have 2 indexing patterns:
 * Create and commit an empty index, then start long initial indexing process 
(might take hours), perform a second commit in the end
 * Using existing index, add no more than 4k documents and commit after that

Seems like the first pattern might cause more problems, but we definitely 
witnessed a similar situation for the second pattern, although it was a bit 
different - caused by {{OutOfMemoryError: Java Heap Space}}, with missing 
{{_q.cfe}} file which produced only {{FileNotFoundException}}, not 
{{CorruptIndexException}}. Please let me know if we need a separate ticket for 
that.

Lucene version: 8.5.0
 Java version: OpenJDK 11

OS: CentOS Linux 7
 Kernel: Linux 3.10.0-1160.11.1.el7.x86_64
 Virtualization: kvm
 Filesystem: xfs

Failed merge stacktrace:
{code:java}
2021-02-02T08:51:51.679+0000
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No 
space left on device
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
Caused by: java.io.IOException: No space left on device
        at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at 
java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
        at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
        at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
        at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
        at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
        at java.base/java.nio.channels.Channels.writeFully(Channels.java:97)
        at java.base/java.nio.channels.Channels$1.write(Channels.java:172)
        at 
org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416)
        at 
java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74)
        at 
java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
        at 
java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
        at 
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
        at 
org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73)
        at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159)
        at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172)
        at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441)
        at 
org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636)
        at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
        at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
{code}
 Followed by failed startup:
{code:java}
2021-02-02T08:52:07.926+0000
org.apache.lucene.index.CorruptIndexException: Unexpected file read error while 
reading index. 
(resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu")))
        at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
Caused by: java.nio.file.NoSuchFileException: 
/data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si
        at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
        at 
java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
        at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
        at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
        at 
org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91)
        at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353)
        at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
        ... 33 common frames omitted
{code}


> CorruptIndexException after failed segment merge caused by No space left on 
> device
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-9867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9867
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>    Affects Versions: 8.5
>            Reporter: Alexander L
>            Priority: Major
>
> Failed segment merge caused by "No space left on device" can't be recovered 
> and Lucene fails with CorruptIndexException after restart. The expectation is 
> that Lucene will be able to restart automatically without manual intervention.
> We have 2 indexing patterns:
>  * Create and commit an empty index, then start long initial indexing process 
> (might take hours), perform a second commit in the end
>  * Using existing index, add no more than 4k documents and commit after that
> Right now we don't have evidence to suggest which pattern caused this issue, 
> but we definitely witnessed a similar situation for the second pattern, 
> although it was a bit different - caused by {{OutOfMemoryError: Java Heap 
> Space}}, with missing {{_q.cfe}} file which produced only 
> {{FileNotFoundException}}, not {{CorruptIndexException}}. Please let me know 
> if we need a separate ticket for that.
> Lucene version: 8.5.0
>  Java version: OpenJDK 11
> OS: CentOS Linux 7
>  Kernel: Linux 3.10.0-1160.11.1.el7.x86_64
>  Virtualization: kvm
>  Filesystem: xfs
> Failed merge stacktrace:
> {code:java}
> 2021-02-02T08:51:51.679+0000
> org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No 
> space left on device
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
> Caused by: java.io.IOException: No space left on device
>       at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>       at 
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
>       at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
>       at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
>       at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
>       at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
>       at java.base/java.nio.channels.Channels.writeFully(Channels.java:97)
>       at java.base/java.nio.channels.Channels$1.write(Channels.java:172)
>       at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416)
>       at 
> java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74)
>       at 
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
>       at 
> java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
>       at 
> org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
>       at 
> org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73)
>       at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159)
>       at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172)
>       at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441)
>       at 
> org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165)
>       at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229)
>       at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159)
>       at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636)
>       at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229)
>       at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
>       at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
>       at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
> {code}
>  Followed by failed startup:
> {code:java}
> 2021-02-02T08:52:07.926+0000
> org.apache.lucene.index.CorruptIndexException: Unexpected file read error 
> while reading index. 
> (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu")))
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)
>       at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si
>       at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
>       at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>       at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
>       at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
>       at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
>       at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
>       at 
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
>       at 
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
>       at 
> org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91)
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353)
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>       ... 33 common frames omitted
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to