easyice commented on PR #12557: URL: https://github.com/apache/lucene/pull/12557#issuecomment-1731767546
Update: when we call `softUpdateDocument` for a segment that already has some deleted doc, it will iterate all the deleted doc use `ReadersAndUpdates#MergedDocValues#onDiskDocValues`, but he has to iterate the array twice, the first time is `Lucene90DocValuesConsumer#writeValues` will compute gcd, min, max. the second time is `IndexedDISI#writeBitSet`, this creates some waste, we can remove the first iterate for soft delete, this can speed up about 53% for updates. Benchmark code: ``` public static void main(final String[] args) throws Exception { long min = Long.MAX_VALUE; for (int i = 0; i < 5; i++) { min = Math.min(doWriteOK(), min); } System.out.println("BEST:" + min); } static long doWrite() throws IOException { Random rand = new Random(5); Directory dir = new ByteBuffersDirectory(); IndexWriter writer = new IndexWriter( dir, new IndexWriterConfig(null) .setSoftDeletesField("_soft_deletes") .setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH)); int maxDoc = 4096 * 100; for (int i = 0; i < maxDoc; i++) { Document doc = new Document(); doc.add(new StringField("id", String.valueOf(i), Field.Store.NO)); writer.addDocument(doc); if (i > 0 && i % 5000 == 0) { writer.commit(); } } System.out.println("start update"); long t0 = System.currentTimeMillis(); for (int i = 0; i < maxDoc; i += 2) { Document doc = new Document(); writer.softUpdateDocument( new Term("id", String.valueOf(i)), doc, new NumericDocValuesField("_soft_deletes", 1)); if (i > 0 && i % 100 == 0) { writer.commit(); } } long tookMs = System.currentTimeMillis() - t0; System.out.println("update took:" + (System.currentTimeMillis() - t0)); IOUtils.close(writer, dir); return tookMs; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org