I've pretty much ruled out system/hardware issues - the AWS instance has been rebooted, and indexing to a core on a new and empty disk/file system fails in the same way with a CorruptIndexException. I can generally get indexing to complete by significantly dialing down the number of indexer scripts running concurrently, but the duration goes up proportionately.
-Simon On Thu, Apr 27, 2017 at 9:26 AM, simon <mtnes...@gmail.com> wrote: > Nope ... huge file system (600gb) only 50% full, and a complete index > would be 80gb max. > > On Wed, Apr 26, 2017 at 4:04 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> Disk space issue? Lucene requires at least as much free disk space as >> your index size. Note that the disk full issue will be transient, IOW >> if you look now and have free space it still may have been all used up >> but had some space reclaimed. >> >> Best, >> Erick >> >> On Wed, Apr 26, 2017 at 12:02 PM, simon <mtnes...@gmail.com> wrote: >> > reposting this as the problem described is happening again and there >> were >> > no responses to the original email. Anyone ? >> > ---------------------------- >> > I'm seeing an odd error during indexing for which I can't find any >> reason. >> > >> > The relevant solr log entry: >> > >> > 2017-03-24 19:09:35.363 ERROR (commitScheduler-30-thread-1) [ >> > x:build0324] o.a.s.u.CommitTracker auto commit >> > error...:java.io.EOFException: read past EOF: >> MMapIndexInput(path="/ >> > indexes/solrindexes/build0324/index/_4ku.fdx") >> > at org.apache.lucene.store.ByteBufferIndexInput.readByte( >> > ByteBufferIndexInput.java:75) >> > ... >> > Suppressed: org.apache.lucene.index.CorruptIndexException: checksum >> > status indeterminate: remaining=0, please run checkindex for more >> details >> > (resource= BufferedChecksumIndexInput(MM >> apIndexInput(path="/indexes/ >> > solrindexes/build0324/index/_4ku.fdx"))) >> > at org.apache.lucene.codecs.CodecUtil.checkFooter( >> > CodecUtil.java:451) >> > at org.apache.lucene.codecs.compressing. >> > CompressingStoredFieldsReader.<init>(CompressingStoredFields >> Reader.java:140) >> > followed within a few seconds by >> > >> > 2017-03-24 19:09:56.402 ERROR (commitScheduler-31-thread-1) [ >> > x:build0324] o.a.s.u.CommitTracker auto commit >> > error...:org.apache.solr.common.SolrException: >> > Error opening new searcher >> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java: >> 1820) >> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1931) >> > ... >> > Caused by: java.io.EOFException: read past EOF: >> > MMapIndexInput(path="/indexes/solrindexes/build0324/index/_4ku.fdx") >> > at org.apache.lucene.store.ByteBufferIndexInput.readByte( >> > ByteBufferIndexInput.java:75) >> > >> > This error is repeated a few times as the indexing continued and further >> > autocommits were triggered. >> > >> > I stopped the indexing process, made a backup snapshot of the index, >> > restarted indexing at a checkpoint, and everything then completed >> without >> > further incidents >> > >> > I ran checkIndex on the saved snapshot and it reported no errors >> > whatsoever. Operations on the complete index (inclcuing an optimize and >> > several query scripts) have all been error-free. >> > >> > Some background: >> > Solr information from the beginning of the checkindex output: >> > ------- >> > Opening index @ /indexes/solrindexes/build0324.bad/index >> > >> > Segments file=segments_9s numSegments=105 version=6.3.0 >> > id=7m1ldieoje0m6sljp7xocbz9l userData={commitTimeMSec=1490400514324} >> > 1 of 105: name=_be maxDoc=1227144 >> > version=6.3.0 >> > id=7m1ldieoje0m6sljp7xocburb >> > codec=Lucene62 >> > compound=false >> > numFiles=14 >> > size (MB)=4,926.186 >> > diagnostics = {os=Linux, java.vendor=Oracle Corporation, >> > java.version=1.8.0_45, java.vm.version=25.45-b02, lucene.version=6.3.0, >> > mergeMaxNumSegments=-1, os.arch=amd64, java.runtime.version=1.8.0_45- >> b13, >> > source=merge, mergeFactor=19, os.version=3.10.0-229.1.2.el7.x86_64, >> > timestamp=1490380905920} >> > no deletions >> > test: open reader.........OK [took 0.176 sec] >> > test: check integrity.....OK [took 37.399 sec] >> > test: check live docs.....OK [took 0.000 sec] >> > test: field infos.........OK [49 fields] [took 0.000 sec] >> > test: field norms.........OK [17 fields] [took 0.030 sec] >> > test: terms, freq, prox...OK [14568108 terms; 612537186 terms/docs >> > pairs; 801208966 tokens] [took 30.005 sec] >> > test: stored fields.......OK [150164874 total field count; avg 122.4 >> > fields per doc] [took 35.321 sec] >> > test: term vectors........OK [4804967 total term vector count; avg >> 3.9 >> > term/freq vector fields per doc] [took 55.857 sec] >> > test: docvalues...........OK [4 docvalues fields; 0 BINARY; 1 >> NUMERIC; >> > 2 SORTED; 0 SORTED_NUMERIC; 1 SORTED_SET] [took 0.954 sec] >> > test: points..............OK [0 fields, 0 points] [took 0.000 sec] >> > ----- >> > >> > The indexing process is a Python script (using the scorched Python >> > client) which spawns multiple instance of itself, in this case 6, so >> there >> > are definitely concurrent calls ( to /update/json ) >> > >> > Solrconfig and the schema have not been changed for several months, >> during >> > which time many ingests have been done, and the documents which were >> being >> > indexed at the time of the error have been indexed before without >> problems, >> > so I don't think it's a data issue. >> > >> > I saw the same error occur earlier in the day, and decided at that time >> to >> > delete the core and restart the Solr instance. >> > >> > The server is an Amazon instance running CentOS 7. I checked the system >> > logs and didn't see any evidence of hardware errors >> > >> > I'm puzzled as to why this would start happening out of the blue and I >> > can't find any partiuclarly relevant posts to this forum or >> Stackexchange. >> > Anyone have an idea what's going on ? >> > >