Re: [I] ForUtil: Can we operate long instead of int to further speed up the encoding/decoding [lucene]

via GitHub Fri, 19 Sep 2025 07:20:51 -0700


jpountz commented on issue #15189:
URL: https://github.com/apache/lucene/issues/15189#issuecomment-3312393135


   Hey @Tony-X, Lucene actually started with long[], see 
https://github.com/apache/lucene/blob/693bb6920df9f86bb2af45af8ba2ad8dc7d608af/lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene99/ForUtil.java#L25-L26
 and related classes (ForDeltaUtil notably).
   
   For reference, the main appeal of storing postings into a long was to speed 
up prefix sums, as this allows summing up two ints in a single instruction. 
Ideally we'd run a SIMD prefix sum, but Java's vector API doesn't allow 
vectorizing as prefix sum efficiently as documented at 
https://en.algorithmica.org/hpc/algorithms/prefix/. I have multiple attempts at 
https://github.com/jpountz/vectorized-prefix-sum if you're curious.
   
   Lucene later switched to SIMD operations to advance within a block as well 
(https://github.com/apache/lucene/blob/693bb6920df9f86bb2af45af8ba2ad8dc7d608af/lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java#L1099)
 and this case works less well when decoding into a long[] than into an int[] 
since there are 2x fewer lanes per vector. Macro-benchmarks actually suggested 
that switching to int[] would perform faster (the speedup when advancing into a 
block is bigger than the decoding slowdown), so we did. 
(https://github.com/apache/lucene/pull/13968)
   
   More recently (Lucene 10.2), Lucene started storing dense postings blocks as 
bit sets, which gave a noticeable speedup as decoding consists of just reading 
a few longs, while advancing relies on efficient nextSetBit operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] ForUtil: Can we operate long instead of int to further speed up the encoding/decoding [lucene]

Reply via email to