Simon After hearing about the weird time issue in EC2, I am going to ask if you have a real server handy for testing. No, I have no hard facts, this is just a suggestion.
And I have no beef with AWS, they have served me really well for other servers. Cheers -- Rick On May 4, 2017 10:49:25 AM EDT, simon <mtnes...@gmail.com> wrote: >I've pretty much ruled out system/hardware issues - the AWS instance >has >been rebooted, and indexing to a core on a new and empty disk/file >system >fails in the same way with a CorruptIndexException. >I can generally get indexing to complete by significantly dialing down >the >number of indexer scripts running concurrently, but the duration goes >up >proportionately. > >-Simon > > >On Thu, Apr 27, 2017 at 9:26 AM, simon <mtnes...@gmail.com> wrote: > >> Nope ... huge file system (600gb) only 50% full, and a complete index >> would be 80gb max. >> >> On Wed, Apr 26, 2017 at 4:04 PM, Erick Erickson ><erickerick...@gmail.com> >> wrote: >> >>> Disk space issue? Lucene requires at least as much free disk space >as >>> your index size. Note that the disk full issue will be transient, >IOW >>> if you look now and have free space it still may have been all used >up >>> but had some space reclaimed. >>> >>> Best, >>> Erick >>> >>> On Wed, Apr 26, 2017 at 12:02 PM, simon <mtnes...@gmail.com> wrote: >>> > reposting this as the problem described is happening again and >there >>> were >>> > no responses to the original email. Anyone ? >>> > ---------------------------- >>> > I'm seeing an odd error during indexing for which I can't find any >>> reason. >>> > >>> > The relevant solr log entry: >>> > >>> > 2017-03-24 19:09:35.363 ERROR (commitScheduler-30-thread-1) [ >>> > x:build0324] o.a.s.u.CommitTracker auto commit >>> > error...:java.io.EOFException: read past EOF: >>> MMapIndexInput(path="/ >>> > indexes/solrindexes/build0324/index/_4ku.fdx") >>> > at org.apache.lucene.store.ByteBufferIndexInput.readByte( >>> > ByteBufferIndexInput.java:75) >>> > ... >>> > Suppressed: org.apache.lucene.index.CorruptIndexException: >checksum >>> > status indeterminate: remaining=0, please run checkindex for more >>> details >>> > (resource= BufferedChecksumIndexInput(MM >>> apIndexInput(path="/indexes/ >>> > solrindexes/build0324/index/_4ku.fdx"))) >>> > at org.apache.lucene.codecs.CodecUtil.checkFooter( >>> > CodecUtil.java:451) >>> > at org.apache.lucene.codecs.compressing. >>> > CompressingStoredFieldsReader.<init>(CompressingStoredFields >>> Reader.java:140) >>> > followed within a few seconds by >>> > >>> > 2017-03-24 19:09:56.402 ERROR (commitScheduler-31-thread-1) [ >>> > x:build0324] o.a.s.u.CommitTracker auto commit >>> > error...:org.apache.solr.common.SolrException: >>> > Error opening new searcher >>> > at >org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java: >>> 1820) >>> > at >org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1931) >>> > ... >>> > Caused by: java.io.EOFException: read past EOF: >>> > >MMapIndexInput(path="/indexes/solrindexes/build0324/index/_4ku.fdx") >>> > at org.apache.lucene.store.ByteBufferIndexInput.readByte( >>> > ByteBufferIndexInput.java:75) >>> > >>> > This error is repeated a few times as the indexing continued and >further >>> > autocommits were triggered. >>> > >>> > I stopped the indexing process, made a backup snapshot of the >index, >>> > restarted indexing at a checkpoint, and everything then completed >>> without >>> > further incidents >>> > >>> > I ran checkIndex on the saved snapshot and it reported no errors >>> > whatsoever. Operations on the complete index (inclcuing an >optimize and >>> > several query scripts) have all been error-free. >>> > >>> > Some background: >>> > Solr information from the beginning of the checkindex output: >>> > ------- >>> > Opening index @ /indexes/solrindexes/build0324.bad/index >>> > >>> > Segments file=segments_9s numSegments=105 version=6.3.0 >>> > id=7m1ldieoje0m6sljp7xocbz9l >userData={commitTimeMSec=1490400514324} >>> > 1 of 105: name=_be maxDoc=1227144 >>> > version=6.3.0 >>> > id=7m1ldieoje0m6sljp7xocburb >>> > codec=Lucene62 >>> > compound=false >>> > numFiles=14 >>> > size (MB)=4,926.186 >>> > diagnostics = {os=Linux, java.vendor=Oracle Corporation, >>> > java.version=1.8.0_45, java.vm.version=25.45-b02, >lucene.version=6.3.0, >>> > mergeMaxNumSegments=-1, os.arch=amd64, >java.runtime.version=1.8.0_45- >>> b13, >>> > source=merge, mergeFactor=19, >os.version=3.10.0-229.1.2.el7.x86_64, >>> > timestamp=1490380905920} >>> > no deletions >>> > test: open reader.........OK [took 0.176 sec] >>> > test: check integrity.....OK [took 37.399 sec] >>> > test: check live docs.....OK [took 0.000 sec] >>> > test: field infos.........OK [49 fields] [took 0.000 sec] >>> > test: field norms.........OK [17 fields] [took 0.030 sec] >>> > test: terms, freq, prox...OK [14568108 terms; 612537186 >terms/docs >>> > pairs; 801208966 tokens] [took 30.005 sec] >>> > test: stored fields.......OK [150164874 total field count; avg >122.4 >>> > fields per doc] [took 35.321 sec] >>> > test: term vectors........OK [4804967 total term vector count; >avg >>> 3.9 >>> > term/freq vector fields per doc] [took 55.857 sec] >>> > test: docvalues...........OK [4 docvalues fields; 0 BINARY; 1 >>> NUMERIC; >>> > 2 SORTED; 0 SORTED_NUMERIC; 1 SORTED_SET] [took 0.954 sec] >>> > test: points..............OK [0 fields, 0 points] [took 0.000 >sec] >>> > ----- >>> > >>> > The indexing process is a Python script (using the scorched >Python >>> > client) which spawns multiple instance of itself, in this case 6, >so >>> there >>> > are definitely concurrent calls ( to /update/json ) >>> > >>> > Solrconfig and the schema have not been changed for several >months, >>> during >>> > which time many ingests have been done, and the documents which >were >>> being >>> > indexed at the time of the error have been indexed before without >>> problems, >>> > so I don't think it's a data issue. >>> > >>> > I saw the same error occur earlier in the day, and decided at that >time >>> to >>> > delete the core and restart the Solr instance. >>> > >>> > The server is an Amazon instance running CentOS 7. I checked the >system >>> > logs and didn't see any evidence of hardware errors >>> > >>> > I'm puzzled as to why this would start happening out of the blue >and I >>> > can't find any partiuclarly relevant posts to this forum or >>> Stackexchange. >>> > Anyone have an idea what's going on ? >>> >> >> -- Sorry for being brief. Alternate email is rickleir at yahoo dot com