[ https://issues.apache.org/jira/browse/LUCENE-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander L updated LUCENE-9867: -------------------------------- Description: A failed segment merge caused by "No space left on device" can't be recovered and Lucene fails with CorruptIndexException after restart. The expectation is that Lucene will be able to restart automatically without manual intervention. We have 2 indexing patterns: * Create and commit an empty index, then start long initial indexing process (might take hours), perform a second commit in the end * Using existing index, add no more than 4k documents and commit after that Lucene version: 8.5.0 Java version: OpenJDK 11 OS: CentOS Linux 7 Kernel: Linux 3.10.0-1160.11.1.el7.x86_64 Virtualization: kvm Filesystem: xfs Failed merge stacktrace: {code:java} 2021-02-02T08:51:51.679+0000 org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) Caused by: java.io.IOException: No space left on device at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74) at java.base/java.nio.channels.Channels.writeFully(Channels.java:97) at java.base/java.nio.channels.Channels$1.write(Channels.java:172) at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416) at java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74) at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) at org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53) at org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73) at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159) at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172) at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441) at org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) {code} Followed by failed startup: {code:java} 2021-02-02T08:52:07.926+0000 org.apache.lucene.index.CorruptIndexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu"))) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846) Caused by: java.nio.file.NoSuchFileException: /data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182) at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292) at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157) at org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289) ... 33 common frames omitted {code} was: A failed segment merge caused by "No space left on device" can't be recovered and Lucene fails with CorruptIndexException after restart. The expectation is that Lucene will be able to restart automatically without manual intervention. We have 2 indexing patterns: * Create and commit an empty index, then start long initial indexing process (might take hours), perform a second commit in the end * Using existing index, add no more than 4k documents and commit after that Lucene version: 8.5.0 Java version: OpenJDK 11 OS: CentOS Linux 7 Kernel: Linux 3.10.0-1160.11.1.el7.x86_64 Virtualization: kvm Filesystem: xfs Failed merge stacktrace: {code:java} 2021-02-02T08:51:51.679+0000org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)Caused by: java.io.IOException: No space left on device at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74) at java.base/java.nio.channels.Channels.writeFully(Channels.java:97) at java.base/java.nio.channels.Channels$1.write(Channels.java:172) at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416) at java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74) at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) at org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53) at org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73) at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159) at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172) at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441) at org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) {code} Followed by failed startup: {code:java} 2021-02-02T08:51:51.679+0000org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)Caused by: java.io.IOException: No space left on device at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74) at java.base/java.nio.channels.Channels.writeFully(Channels.java:97) at java.base/java.nio.channels.Channels$1.write(Channels.java:172) at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416) at java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74) at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) at org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53) at org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73) at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159) at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172) at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441) at org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) {code} > CorruptIndexException after failed segment merge caused by No space left on > device > ---------------------------------------------------------------------------------- > > Key: LUCENE-9867 > URL: https://issues.apache.org/jira/browse/LUCENE-9867 > Project: Lucene - Core > Issue Type: Bug > Components: core/store > Affects Versions: 8.5 > Reporter: Alexander L > Priority: Major > > A failed segment merge caused by "No space left on device" can't be recovered > and Lucene fails with CorruptIndexException after restart. The expectation is > that Lucene will be able to restart automatically without manual intervention. > We have 2 indexing patterns: > * Create and commit an empty index, then start long initial indexing process > (might take hours), perform a second commit in the end > * Using existing index, add no more than 4k documents and commit after that > Lucene version: 8.5.0 > Java version: OpenJDK 11 > OS: CentOS Linux 7 > Kernel: Linux 3.10.0-1160.11.1.el7.x86_64 > Virtualization: kvm > Filesystem: xfs > Failed merge stacktrace: > {code:java} > 2021-02-02T08:51:51.679+0000 > org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No > space left on device > at > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) > Caused by: java.io.IOException: No space left on device > at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) > at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) > at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) > at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) > at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74) > at java.base/java.nio.channels.Channels.writeFully(Channels.java:97) > at java.base/java.nio.channels.Channels$1.write(Channels.java:172) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416) > at > java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74) > at > java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) > at > java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) > at > org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53) > at > org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73) > at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159) > at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172) > at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441) > at > org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636) > at > org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) > {code} > Followed by failed startup: > {code:java} > 2021-02-02T08:52:07.926+0000 > org.apache.lucene.index.CorruptIndexException: Unexpected file read error > while reading index. > (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu"))) > at > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291) > at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846) > Caused by: java.nio.file.NoSuchFileException: > /data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182) > at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292) > at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345) > at > org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81) > at > org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157) > at > org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91) > at > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353) > at > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289) > ... 33 common frames omitted > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org