Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

via GitHub Mon, 07 Apr 2025 11:42:12 -0700


jainankitk commented on PR #14439:
URL: https://github.com/apache/lucene/pull/14439#issuecomment-2784239065


   @stefanvodita / @jpountz - Would love to get your thoughts on this 
optimization, and how we can leverage it in Lucene. In a nutshell, it solves 
the below problem:
   
   Given a sorted non-overlapping set of intervals (Histogram buckets could be 
an example), it collects the matching documents count in single travel of 
`PointsTree` index, by skipping over the `leafBlocks` completely unless the 
values in `leafBlock` overlap with more than one interval. This ensures that 
the `# leafBlocks` actually traversed is bounded by the `# buckets` and 
remaining `leafBlocks` are collected in bulk. Hence it can very efficiently 
collect the doc counts, especially when the `# documents / # buckets` is pretty 
high.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

Reply via email to