gf2121 opened a new issue, #14521: URL: https://github.com/apache/lucene/issues/14521
### Description Soft deletes consume a lot of CPU when flushing docvalue updates or calculating the `numsToDelete` in `SoftDeleteRetentionMergePolicy`. I was looking for some way to speed up these operations. The new `DocIdSetIterator#intoBitset` interface seems to provide a good approach, which is as follows: - [ ] Implement `intoBItset` of IndexedDISI. - [ ] Implement `intoBitset` of `MergedDocValues` and `SingleValueNumericDocValuesFieldUpdates#iterator`, and call intoBitset in `IndexedDISI#writeBitSet`. - [ ] Count numsToDelete with `intoBitset`, `Bits#applyMask` and `popCnt`. Another optimization I'm looking for is to expose the fact that soft deleted fields always use a single value, so that we can avoid having to go through the calculations to calculate the min/max/gcd. My current idea of API designing is pretty simple, but I'm not sure if it's good enough. ``` public abstract class NumericDocValues extends DocValuesIterator { /** * If the impl knows all docs have the same value, return the value, otherwise null. */ public abstract Long singleValue(); ... } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org