jpountz opened a new pull request, #14204:
URL: https://github.com/apache/lucene/pull/14204

   This is inspired from a paper by Tencent where the authors describe how they 
speed up so-called "histogram queries" by sorting the index by timestamp and 
translating ranges of values corresponding to each histogram bucket to ranges 
of doc IDs. This way, at collection time, they no longer need to look up values 
and can compute the histogram purely by looking at collected doc IDs.
   
   YU, Muzhi, LIN, Zhaoxiang, SUN, Jinan, et al. TencentCLS: the cloud log 
service with high query performances. Proceedings of the VLDB Endowment, 2022, 
vol. 15, no 12, p. 3472-3482.
   
   Instead of binary-searching the doc ID space to translate histogram buckets 
into ranges of doc IDs, the new collector manager uses recently introduced 
support for sparse indexing. When playing with the geonames dataset, computing 
a histogram of the elevation field runs ~2-3x faster with this optimization 
than with the naive implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to