[ 
https://issues.apache.org/jira/browse/LUCENE-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308363#comment-17308363
 ] 

Alexander L edited comment on LUCENE-9867 at 3/25/21, 4:56 AM:
---------------------------------------------------------------

Hi [~rcmuir], thank you for the quick response!

Regarding the first indexing pattern - it was just our guess that it is more 
likely to cause this problem, since we don't perform the second commit until 
the large chunk of documents is added. Unfortunately at the moment we don't 
have evidence in the logs which shows which pattern caused the stacktrace 
above. Probably you are right and that was pattern #2 as well. I'll add this 
detail to the description.

And I found the stacktrace for the OOM-related issue. You are correct, that was 
the {{NoSuchFileException}} (fixed that in the description):
{code:java}
2021-03-10T09:54:07.596+0000 ERROR Failing due to unexpected error.
java.lang.OutOfMemoryError: Java heap space

* restart *

2021-03-10T09:54:19.942+0000
java.nio.file.NoSuchFileException: /data/602d9464e850d13d46b6ee3f/_q.cfe
        at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
        at 
java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
        at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
        at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
        at 
org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.readEntries(Lucene50CompoundReader.java:106)
        at 
org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.<init>(Lucene50CompoundReader.java:70)
        at 
org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:70)
        at 
org.apache.lucene.index.IndexWriter.readFieldInfos(IndexWriter.java:976)
        at 
org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:993)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:878)
{code}


was (Author: sqshq):
Hi [~rcmuir], thank you for the quick response!

Regarding the first indexing pattern - it was just our guess that it is more 
likely to cause this problem (because we don't commit until the whole large 
chunk of documents is added). Unfortunately at the moment we don't have 
evidence in the logs which shows which pattern caused the stacktrace above. 
Probably you are right and that was pattern #2 as well. I'll add this detail to 
the description.

And I found the stacktrace for the OOM-related issue. You are correct, that was 
the {{NoSuchFileException}} (fixed that in the description):
{code:java}
2021-03-10T09:54:07.596+0000 ERROR Failing due to unexpected error.
java.lang.OutOfMemoryError: Java heap space

* restart *

2021-03-10T09:54:19.942+0000
java.nio.file.NoSuchFileException: /data/602d9464e850d13d46b6ee3f/_q.cfe
        at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
        at 
java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
        at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
        at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
        at 
org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.readEntries(Lucene50CompoundReader.java:106)
        at 
org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.<init>(Lucene50CompoundReader.java:70)
        at 
org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:70)
        at 
org.apache.lucene.index.IndexWriter.readFieldInfos(IndexWriter.java:976)
        at 
org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:993)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:878)
{code}

> CorruptIndexException after failed segment merge caused by No space left on 
> device
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-9867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9867
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>    Affects Versions: 8.5
>            Reporter: Alexander L
>            Priority: Major
>
> Failed segment merge caused by "No space left on device" can't be recovered 
> and Lucene fails with CorruptIndexException after restart. The expectation is 
> that Lucene will be able to restart automatically without manual intervention.
> We have 2 indexing patterns:
>  * Create and commit an empty index, then start long initial indexing process 
> (might take hours), perform a second commit in the end
>  * Using existing index, add no more than 4k documents and commit after that
> Right now we don't have evidence to suggest which pattern caused this issue, 
> but we definitely witnessed a similar situation for the second pattern, 
> although it was a bit different - caused by {{OutOfMemoryError: Java Heap 
> Space}}, with missing {{_q.cfe}} file which produced only 
> {{NoSuchFileException}}, not {{CorruptIndexException}}. Please let me know if 
> we need a separate ticket for that.
> Lucene version: 8.5.0
>  Java version: OpenJDK 11
> OS: CentOS Linux 7
>  Kernel: Linux 3.10.0-1160.11.1.el7.x86_64
>  Virtualization: kvm
>  Filesystem: xfs
> Failed merge stacktrace:
> {code:java}
> 2021-02-02T08:51:51.679+0000
> org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No 
> space left on device
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
> Caused by: java.io.IOException: No space left on device
>       at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>       at 
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
>       at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
>       at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
>       at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
>       at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
>       at java.base/java.nio.channels.Channels.writeFully(Channels.java:97)
>       at java.base/java.nio.channels.Channels$1.write(Channels.java:172)
>       at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416)
>       at 
> java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74)
>       at 
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
>       at 
> java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
>       at 
> org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
>       at 
> org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73)
>       at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159)
>       at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172)
>       at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441)
>       at 
> org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165)
>       at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229)
>       at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159)
>       at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636)
>       at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229)
>       at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
>       at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
>       at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
> {code}
>  Followed by failed startup:
> {code:java}
> 2021-02-02T08:52:07.926+0000
> org.apache.lucene.index.CorruptIndexException: Unexpected file read error 
> while reading index. 
> (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu")))
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)
>       at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si
>       at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
>       at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>       at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
>       at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
>       at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
>       at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
>       at 
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
>       at 
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
>       at 
> org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91)
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353)
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>       ... 33 common frames omitted
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to