jpountz commented on issue #15189: URL: https://github.com/apache/lucene/issues/15189#issuecomment-3312393135
Hey @Tony-X, Lucene actually started with long[], see https://github.com/apache/lucene/blob/693bb6920df9f86bb2af45af8ba2ad8dc7d608af/lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene99/ForUtil.java#L25-L26 and related classes (ForDeltaUtil notably). For reference, the main appeal of storing postings into a long was to speed up prefix sums, as this allows summing up two ints in a single instruction. Ideally we'd run a SIMD prefix sum, but Java's vector API doesn't allow vectorizing as prefix sum efficiently as documented at https://en.algorithmica.org/hpc/algorithms/prefix/. I have multiple attempts at https://github.com/jpountz/vectorized-prefix-sum if you're curious. Lucene later switched to SIMD operations to advance within a block as well (https://github.com/apache/lucene/blob/693bb6920df9f86bb2af45af8ba2ad8dc7d608af/lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java#L1099) and this case works less well when decoding into a long[] than into an int[] since there are 2x fewer lanes per vector. Macro-benchmarks actually suggested that switching to int[] would perform faster (the speedup when advancing into a block is bigger than the decoding slowdown), so we did. (https://github.com/apache/lucene/pull/13968) More recently (Lucene 10.2), Lucene started storing dense postings blocks as bit sets, which gave a noticeable speedup as decoding consists of just reading a few longs, while advancing relies on efficient nextSetBit operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
