gf2121 opened a new pull request, #13217: URL: https://github.com/apache/lucene/pull/13217
This PR proposes a new way to do numeric dynamic pruning with following changes: * Instead of sampling and estimating point count to judge whether build the competitive iterator, this patch proposes to find out the threshold value. That said, we find out the value that 'N docs away from' the top value, in favor of the [the fact that top value should be final in LeafComparators](https://github.com/apache/lucene/blob/99b9636fd8c383c80d06c8815cfdb49b1b77dcdb/lucene/core/src/java/org/apache/lucene/search/FieldComparator.java#L75).  * Instead of building and rebuilding the competitive iterator when it get smaller, this patch proposes to build the competitive iterator as a disjunction of small DISIs. Each small DISI maintains its most competitive value and discarded when their most competitive value is no more competitive, like what we did in `TermOrdValComparator`. This helps us intersect the tree only once and update the competitive iterator more frequently. #### Some minor points * For simplification, i tweaked the bytes codec things to a comparable long value. e.g. `maxValueAsBytes` -> `maxValueAsLong`. * This PR still works with the stale method `public static long estimatePointCount(IntersectVisitor visitor, PointTree pointTree, long upperBound) throws IOException`, it seems a bit challenging to work with the new boolean API, i'll dig more, and this is why this is a draft. Here is a result based on wikimedium10m (baseline contains https://github.com/apache/lucene/pull/13199) ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value TermDTSort 176.84 (5.0%) 303.31 (6.3%) 71.5% ( 57% - 87%) 0.000 HighTermDayOfYearSort 454.72 (3.2%) 791.09 (7.9%) 74.0% ( 60% - 87%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org