gf2121 opened a new pull request, #13217:
URL: https://github.com/apache/lucene/pull/13217

   This PR proposes a new way to do numeric dynamic pruning with following 
changes:
   
   * Instead of sampling and estimating point count to judge whether build the 
competitive iterator, this patch proposes to find out the threshold value. That 
said, we find out the value that 'N docs away from' the top value, in favor of 
the [the fact that top value should be final in 
LeafComparators](https://github.com/apache/lucene/blob/99b9636fd8c383c80d06c8815cfdb49b1b77dcdb/lucene/core/src/java/org/apache/lucene/search/FieldComparator.java#L75).
 
   
   
![image](https://github.com/apache/lucene/assets/52390227/b7125122-194a-4167-a821-57c47479915d)
   
   * Instead of building and rebuilding the competitive iterator when it get 
smaller, this patch proposes to build the competitive iterator as a disjunction 
of small DISIs. Each small DISI maintains its most competitive value and 
discarded when their most competitive value is no more competitive, like what 
we did in `TermOrdValComparator`. This helps us intersect the tree only once 
and update the competitive iterator more frequently.
   
   #### Some minor points
   
   * For simplification, i tweaked the bytes codec things to a comparable long 
value. e.g. `maxValueAsBytes` -> `maxValueAsLong`.
   
   * This PR still works with the stale method `public static long 
estimatePointCount(IntersectVisitor visitor, PointTree pointTree, long 
upperBound) throws IOException`, it seems a bit challenging to work with the 
new boolean API, i'll dig more, and this is why this is a draft.
   
   Here is a result based on wikimedium10m (baseline contains 
https://github.com/apache/lucene/pull/13199)
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                         TermDTSort      176.84      (5.0%)      303.31      
(6.3%)   71.5% (  57% -   87%) 0.000
              HighTermDayOfYearSort      454.72      (3.2%)      791.09      
(7.9%)   74.0% (  60% -   87%) 0.000
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to