Corrupted Index with custom directory

Geoff Cooney Fri, 22 Jan 2016 13:09:12 -0800

Hi,

We have a requirement to pre-encrypt an index we are building before it
hits disk.  We are doing this by using a wrapper around MMapDirectory that
wraps the input/output streams(I know the general recommendation is to
encrypt the filesystem instead but this option was explicitly rejected by
our security group).


The issue we've been running into is that the once indexes get beyond "very
small", we start seeing corruption issues on some but not all queries.
Running CheckIndex reports any segments that are not compound=false as
corrupt(don't know if this is the actual root cause).  While the exact
error messages differ it is always an issue with the .doc segment file.
Also of interest, this doesn't seem to block segment merges at all as
segments reporting as corrupt later get merged succesfully.

We have not yet been able to reliably reproduce the corruption issue in a
simple isolated test.  I'm wondering if anyone has any tips on places to
look or tests to run that might help isolate the issue?

One corrupted segment:

  2 of 12: name=_7h08 maxDoc=41769
    version=5.3.1
    id=8f4kteokbievldjcgq7ly7cj3
    codec=Lucene53
    compound=false
    numFiles=10
    size (MB)=8.666
    diagnostics = {os=Linux, java.vendor=Oracle Corporation,
java.version=1.8.0_65, java.vm.version=25.65-b01, lucene.version=5.3.1,
mergeMaxNumSegments=-1, os.arch=amd64, java.runtime.version=1.8.0_65-b17,
source=merge, mergeFactor=10, os.version=2.6.32-573.12.1.el6.x86_64,
timestamp=1453492996850}
    no deletions
    test: open reader.........OK [took 0.034 sec]
    test: check integrity.....OK [took 1.200 sec]
    test: check live docs.....OK [took 0.000 sec]
    test: field infos.........OK [24 fields] [took 0.000 sec]
    test: field norms.........OK [5 fields] [took 0.034 sec]
    test: terms, freq, prox...ERROR: java.io.EOFException: at the end of
the file
java.io.EOFException: at the end of the file
    at
org.apache.lucene.store.DecryptingMMapIndexInput.ensureNotEOF(DecryptingMMapIndexInput.java:236)
    at
org.apache.lucene.store.DecryptingMMapIndexInput.readByte(DecryptingMMapIndexInput.java:216)
    at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125)
    at
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader.readVIntBlock(Lucene50PostingsReader.java:132)
    at
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockPostingsEnum.refillDocs(Lucene50PostingsReader.java:619)
    at
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockPostingsEnum.advance(Lucene50PostingsReader.java:716)
    at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1411)
    at org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1666)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:700)
    at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237)
    test: stored fields.......OK [125307 total field count; avg 3.0 fields
per doc] [took 0.332 sec]
    test: term vectors........OK [0 total term vector count; avg 0.0
term/freq vector fields per doc] [took 0.000 sec]
    test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC;
0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] [took 0.000 sec]
FAILED
    WARNING: exorciseIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:720)
    at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237)

Thanks for any help!

Cheers,
Geoff

Corrupted Index with custom directory

Reply via email to