Re: Indexing I/O errors and CorruptIndex messages

Rick Leir Thu, 04 May 2017 09:15:06 -0700

Simon 
After hearing about the weird time issue in EC2, I am going to ask if you have 
a real server handy for testing. No, I have no hard facts, this is just a 
suggestion.


And I have no beef with AWS, they have served me really well for other servers.
Cheers -- Rick

On May 4, 2017 10:49:25 AM EDT, simon <mtnes...@gmail.com> wrote:
>I've pretty much ruled out system/hardware issues - the AWS instance
>has
>been rebooted,  and indexing to a core on a new and empty  disk/file
>system
>fails in the same way with a CorruptIndexException.
>I can  generally get indexing to complete by significantly dialing down
>the
>number of indexer scripts running concurrently, but the duration goes
>up
>proportionately.
>
>-Simon
>
>
>On Thu, Apr 27, 2017 at 9:26 AM, simon <mtnes...@gmail.com> wrote:
>
>> Nope ... huge file system (600gb) only 50% full, and a complete index
>> would be 80gb max.
>>
>> On Wed, Apr 26, 2017 at 4:04 PM, Erick Erickson
><erickerick...@gmail.com>
>> wrote:
>>
>>> Disk space issue? Lucene requires at least as much free disk space
>as
>>> your index size. Note that the disk full issue will be transient,
>IOW
>>> if you look now and have free space it still may have been all used
>up
>>> but had some space reclaimed.
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Apr 26, 2017 at 12:02 PM, simon <mtnes...@gmail.com> wrote:
>>> > reposting this as the problem described is happening again and
>there
>>> were
>>> > no responses to the original email. Anyone ?
>>> > ----------------------------
>>> > I'm seeing an odd error during indexing for which I can't find any
>>> reason.
>>> >
>>> > The relevant solr log entry:
>>> >
>>> > 2017-03-24 19:09:35.363 ERROR (commitScheduler-30-thread-1) [
>>> > x:build0324] o.a.s.u.CommitTracker auto commit
>>> > error...:java.io.EOFException: read past EOF:
>>> MMapIndexInput(path="/
>>> > indexes/solrindexes/build0324/index/_4ku.fdx")
>>> >      at org.apache.lucene.store.ByteBufferIndexInput.readByte(
>>> > ByteBufferIndexInput.java:75)
>>> > ...
>>> >     Suppressed: org.apache.lucene.index.CorruptIndexException:
>checksum
>>> > status indeterminate: remaining=0, please run checkindex for more
>>> details
>>> > (resource=     BufferedChecksumIndexInput(MM
>>> apIndexInput(path="/indexes/
>>> > solrindexes/build0324/index/_4ku.fdx")))
>>> >          at org.apache.lucene.codecs.CodecUtil.checkFooter(
>>> > CodecUtil.java:451)
>>> >          at org.apache.lucene.codecs.compressing.
>>> > CompressingStoredFieldsReader.<init>(CompressingStoredFields
>>> Reader.java:140)
>>> >  followed within a few seconds by
>>> >
>>> >  2017-03-24 19:09:56.402 ERROR (commitScheduler-31-thread-1) [
>>> > x:build0324] o.a.s.u.CommitTracker auto commit
>>> > error...:org.apache.solr.common.SolrException:
>>> > Error opening new searcher
>>> >     at
>org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
>>> 1820)
>>> >     at
>org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1931)
>>> > ...
>>> > Caused by: java.io.EOFException: read past EOF:
>>> >
>MMapIndexInput(path="/indexes/solrindexes/build0324/index/_4ku.fdx")
>>> >     at org.apache.lucene.store.ByteBufferIndexInput.readByte(
>>> > ByteBufferIndexInput.java:75)
>>> >
>>> > This error is repeated a few times as the indexing continued and
>further
>>> > autocommits were triggered.
>>> >
>>> > I stopped the indexing process, made a backup snapshot of the
>index,
>>> >  restarted indexing at a checkpoint, and everything then completed
>>> without
>>> > further incidents
>>> >
>>> > I ran checkIndex on the saved snapshot and it reported no errors
>>> > whatsoever. Operations on the complete index (inclcuing an
>optimize and
>>> > several query scripts) have all been error-free.
>>> >
>>> > Some background:
>>> >  Solr information from the beginning of the checkindex output:
>>> >  -------
>>> >  Opening index @ /indexes/solrindexes/build0324.bad/index
>>> >
>>> > Segments file=segments_9s numSegments=105 version=6.3.0
>>> > id=7m1ldieoje0m6sljp7xocbz9l
>userData={commitTimeMSec=1490400514324}
>>> >   1 of 105: name=_be maxDoc=1227144
>>> >     version=6.3.0
>>> >     id=7m1ldieoje0m6sljp7xocburb
>>> >     codec=Lucene62
>>> >     compound=false
>>> >     numFiles=14
>>> >     size (MB)=4,926.186
>>> >     diagnostics = {os=Linux, java.vendor=Oracle Corporation,
>>> > java.version=1.8.0_45, java.vm.version=25.45-b02,
>lucene.version=6.3.0,
>>> > mergeMaxNumSegments=-1, os.arch=amd64,
>java.runtime.version=1.8.0_45-
>>> b13,
>>> > source=merge, mergeFactor=19,
>os.version=3.10.0-229.1.2.el7.x86_64,
>>> > timestamp=1490380905920}
>>> >     no deletions
>>> >     test: open reader.........OK [took 0.176 sec]
>>> >     test: check integrity.....OK [took 37.399 sec]
>>> >     test: check live docs.....OK [took 0.000 sec]
>>> >     test: field infos.........OK [49 fields] [took 0.000 sec]
>>> >     test: field norms.........OK [17 fields] [took 0.030 sec]
>>> >     test: terms, freq, prox...OK [14568108 terms; 612537186
>terms/docs
>>> > pairs; 801208966 tokens] [took 30.005 sec]
>>> >     test: stored fields.......OK [150164874 total field count; avg
>122.4
>>> > fields per doc] [took 35.321 sec]
>>> >     test: term vectors........OK [4804967 total term vector count;
>avg
>>> 3.9
>>> > term/freq vector fields per doc] [took 55.857 sec]
>>> >     test: docvalues...........OK [4 docvalues fields; 0 BINARY; 1
>>> NUMERIC;
>>> > 2 SORTED; 0 SORTED_NUMERIC; 1 SORTED_SET] [took 0.954 sec]
>>> >     test: points..............OK [0 fields, 0 points] [took 0.000
>sec]
>>> >   -----
>>> >
>>> >   The indexing process is a Python script (using the scorched
>Python
>>> > client)  which spawns multiple instance of itself, in this case 6,
>so
>>> there
>>> > are definitely concurrent calls ( to /update/json )
>>> >
>>> > Solrconfig and the schema have not been changed for several
>months,
>>> during
>>> > which time many ingests have been done, and the documents which
>were
>>> being
>>> > indexed at the time of the error have been indexed before without
>>> problems,
>>> > so I don't think it's a data issue.
>>> >
>>> > I saw the same error occur earlier in the day, and decided at that
>time
>>> to
>>> > delete the core and restart the Solr instance.
>>> >
>>> > The server is an Amazon instance running CentOS 7. I checked the
>system
>>> > logs and didn't see any evidence of hardware errors
>>> >
>>> > I'm puzzled as to why this would start happening out of the blue
>and I
>>> > can't find any partiuclarly relevant posts to this forum or
>>> Stackexchange.
>>> > Anyone have an idea what's going on ?
>>>
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Indexing I/O errors and CorruptIndex messages

Reply via email to