Re: [I] Explore within-block skipping for postings [lucene]

via GitHub Tue, 31 Oct 2023 05:52:31 -0700


jpountz commented on issue #12486:
URL: https://github.com/apache/lucene/issues/12486#issuecomment-1787160140


   > Lucene is also doing the "accumulate docid deltas into the absolute docid" 
too in this loop, but I guess Tantivy does this separately somehow?
   
   I believe Tantivy does the same, except that it can take advantage of SIMD 
to accumulate docid deltas into the absolute docid (if it did not accumulate 
deltas up-front, it could not run a branchless binary search later on). I tried 
to look into whether we can do the same with Panama, but last time I checked it 
doesn't give ways to use `_mm_slli_si128`, which prevents us from making a 
faster prefix sum through vectorization: 
https://github.com/jpountz/vectorized-prefix-sum.
   
   For reference, there is also this old @mkhludnev idea about encoding dense 
postings lists as bitsets, which would naturally help with skipping: #6116 (or 
can we do it on a per-block basis?). And more generally, there are some formats 
that are better at skipping within blocks like Elias-Fano.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Explore within-block skipping for postings [lucene]

Reply via email to