Hi, We have a requirement to pre-encrypt an index we are building before it hits disk. We are doing this by using a wrapper around MMapDirectory that wraps the input/output streams(I know the general recommendation is to encrypt the filesystem instead but this option was explicitly rejected by our security group).
The issue we've been running into is that the once indexes get beyond "very small", we start seeing corruption issues on some but not all queries. Running CheckIndex reports any segments that are not compound=false as corrupt(don't know if this is the actual root cause). While the exact error messages differ it is always an issue with the .doc segment file. Also of interest, this doesn't seem to block segment merges at all as segments reporting as corrupt later get merged succesfully. We have not yet been able to reliably reproduce the corruption issue in a simple isolated test. I'm wondering if anyone has any tips on places to look or tests to run that might help isolate the issue? One corrupted segment: 2 of 12: name=_7h08 maxDoc=41769 version=5.3.1 id=8f4kteokbievldjcgq7ly7cj3 codec=Lucene53 compound=false numFiles=10 size (MB)=8.666 diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_65, java.vm.version=25.65-b01, lucene.version=5.3.1, mergeMaxNumSegments=-1, os.arch=amd64, java.runtime.version=1.8.0_65-b17, source=merge, mergeFactor=10, os.version=2.6.32-573.12.1.el6.x86_64, timestamp=1453492996850} no deletions test: open reader.........OK [took 0.034 sec] test: check integrity.....OK [took 1.200 sec] test: check live docs.....OK [took 0.000 sec] test: field infos.........OK [24 fields] [took 0.000 sec] test: field norms.........OK [5 fields] [took 0.034 sec] test: terms, freq, prox...ERROR: java.io.EOFException: at the end of the file java.io.EOFException: at the end of the file at org.apache.lucene.store.DecryptingMMapIndexInput.ensureNotEOF(DecryptingMMapIndexInput.java:236) at org.apache.lucene.store.DecryptingMMapIndexInput.readByte(DecryptingMMapIndexInput.java:216) at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125) at org.apache.lucene.codecs.lucene50.Lucene50PostingsReader.readVIntBlock(Lucene50PostingsReader.java:132) at org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockPostingsEnum.refillDocs(Lucene50PostingsReader.java:619) at org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockPostingsEnum.advance(Lucene50PostingsReader.java:716) at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1411) at org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1666) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:700) at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237) test: stored fields.......OK [125307 total field count; avg 3.0 fields per doc] [took 0.332 sec] test: term vectors........OK [0 total term vector count; avg 0.0 term/freq vector fields per doc] [took 0.000 sec] test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] [took 0.000 sec] FAILED WARNING: exorciseIndex() would remove reference to this segment; full exception: java.lang.RuntimeException: Term Index test failed at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:720) at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237) Thanks for any help! Cheers, Geoff