gf2121 opened a new issue, #14521:
URL: https://github.com/apache/lucene/issues/14521

   ### Description
   
   Soft deletes consume a lot of CPU when flushing docvalue updates or 
calculating the `numsToDelete` in `SoftDeleteRetentionMergePolicy`. I was 
looking for some way to speed up these operations. The new 
`DocIdSetIterator#intoBitset` interface seems to provide a good approach, which 
is as follows:
   
   - [ ] Implement `intoBItset` of IndexedDISI.
   
   - [ ] Implement `intoBitset` of `MergedDocValues` and 
`SingleValueNumericDocValuesFieldUpdates#iterator`, and call intoBitset in 
`IndexedDISI#writeBitSet`.
   - [ ] Count numsToDelete with `intoBitset`, `Bits#applyMask` and `popCnt`.
   
   
   
   Another optimization I'm looking for is to expose the fact that soft deleted 
fields always use a single value, so that we can avoid having to go through the 
calculations to calculate the min/max/gcd. My current idea of API designing is 
pretty simple, but I'm not sure if it's good enough.
   
   ```
   public abstract class NumericDocValues extends DocValuesIterator {
   
     /**
      * If the impl knows all docs have the same value, return the value, 
otherwise null. 
      */
     public abstract Long singleValue();
   
     ...
   
   }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to