jainankitk commented on PR #14439: URL: https://github.com/apache/lucene/pull/14439#issuecomment-2784239065
@stefanvodita / @jpountz - Would love to get your thoughts on this optimization, and how we can leverage it in Lucene. In a nutshell, it solves the below problem: Given a sorted non-overlapping set of intervals (Histogram buckets could be an example), it collects the matching documents count in single travel of `PointsTree` index, by skipping over the `leafBlocks` completely unless the values in `leafBlock` overlap with more than one interval. This ensures that the `# leafBlocks` actually traversed is bounded by the `# buckets` and remaining `leafBlocks` are collected in bulk. Hence it can very efficiently collect the doc counts, especially when the `# documents / # buckets` is pretty high. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org