rahulgoswami commented on issue #12356:
URL: https://github.com/apache/lucene/issues/12356#issuecomment-1583204847

   Based on the idea that @romseygeek proposed above, I tried the below change 
in readByte(log pos). It works sometimes, but testing on a 5 million+ dataset I 
get a CorruptIndexException some time into the indexing. Not sure why that 
should happen, and only after making this change. Thoughts ?
   
     @Override
     public final byte readByte(long pos) throws IOException {
       long index = pos - bufferStart;
       if (index < 0 || index >= buffer.limit()) {
         if(index<0) {
           bufferStart=Math.min(pos, (bufferStart-bufferSize) < 0 ? 
0:(bufferStart-bufferSize));
         }else {
           bufferStart = pos;
         }
         buffer.limit(0);  // trigger refill() on read
         seekInternal(pos);
         refill();
         index = 0;
       }
       return buffer.get((int) index);
     }
   
   
   Exception:
   2023-06-08 03:14:58.040 ERROR (qtp391506011-24) [   x:techproducts] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Server error 
writing document id c55ea05bded8d478d1942535f30c5791001 to the index => 
org.apache.solr.common.SolrException: Server error writing document id 
c55ea05bded8d478d1942535f30c5791001 to the index
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:246)
   org.apache.solr.common.SolrException: Server error writing document id 
c55ea05bded8d478d1942535f30c5791001 to the index
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:246)
 ~[?:?]
   .....
   Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter 
is closed
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:886) 
~[?:?]
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:900) 
~[?:?]
        at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1477) 
~[?:?]
        at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1473) 
~[?:?]
        at 
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:973)
 ~[?:?]
        at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:342)
 ~[?:?]
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:294)
 ~[?:?]
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
 ~[?:?]
        ... 68 more
   Caused by: org.apache.lucene.index.CorruptIndexException: invalid state: 
base=0, docID=93773 
(resource=SimpleFSIndexInput(path="C:\Work\Solr\solr-8.11.1\example\techproducts\solr\techproducts\data\index\_13xx.fdt"))
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.copyChunks(CompressingStoredFieldsWriter.java:560)
 ~[?:?]
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:633)
 ~[?:?]
        at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) ~[?:?]
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) 
~[?:?]
        at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4788) ~[?:?]
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4392) 
~[?:?]
        at 
org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:201) ~[?:?]
        at 
org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5951)
 ~[?:?]
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
 ~[?:?]
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
 ~[?:?]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to