easyice commented on PR #12557:
URL: https://github.com/apache/lucene/pull/12557#issuecomment-1731767546

   Update:
   
   when we call `softUpdateDocument` for a segment that already has some 
deleted doc, it will iterate all the deleted doc use 
`ReadersAndUpdates#MergedDocValues#onDiskDocValues`, but he has to iterate the 
array twice, the first time is 
    `Lucene90DocValuesConsumer#writeValues`  will compute gcd, min, max. the 
second time is `IndexedDISI#writeBitSet`, this creates some waste,  we can 
remove the first iterate for soft delete, this can speed up about 53% for 
updates.
   
   Benchmark code:
   ```
     public static void main(final String[] args) throws Exception {
       long min = Long.MAX_VALUE;
       for (int i = 0; i < 5; i++) {
         min = Math.min(doWriteOK(), min);
       }
       System.out.println("BEST:" + min);
     }
   
   static long doWrite() throws IOException {
       Random rand = new Random(5);
       Directory dir = new ByteBuffersDirectory();
       IndexWriter writer =
           new IndexWriter(
               dir,
               new IndexWriterConfig(null)
                   .setSoftDeletesField("_soft_deletes")
                   .setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH));
       int maxDoc = 4096 * 100;
   
       for (int i = 0; i < maxDoc; i++) {
         Document doc = new Document();
         doc.add(new StringField("id", String.valueOf(i), Field.Store.NO));
   
         writer.addDocument(doc);
         if (i > 0 && i % 5000 == 0) {
           writer.commit();
         }
       }
   
       System.out.println("start update");
       long t0 = System.currentTimeMillis();
   
       for (int i = 0; i < maxDoc; i += 2) {
         Document doc = new Document();
         writer.softUpdateDocument(
             new Term("id", String.valueOf(i)), doc, new 
NumericDocValuesField("_soft_deletes", 1));
         if (i > 0 && i % 100 == 0) {
           writer.commit();
         }
       }
       long tookMs = System.currentTimeMillis() - t0;
       System.out.println("update took:" + (System.currentTimeMillis() - t0));
   
       IOUtils.close(writer, dir);
       return tookMs;
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to