jpountz commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1611459853
Thanks for looking into this! For reference, I've been separately looking into whether we could vectorize prefix sums, which is one bottleneck of postings decoding today as we managed to get auto-vectorization to help with bit unpacking, but not with computing a prefix sum of doc IDs: https://github.com/jpountz/vectorized-prefix-sum. Unfortunately, Java's vector API doesn't offer a way to perform a shift across lanes efficiently (e.g. `IntVector#unslice` doesn't compile to [vpslldq](https://www.felixcloutier.com/x86/pslldq) with SPECIES_128) while it's the traditional way how prefix sum is vectorized. I've looked into other approaches, but they're all slower than a scalar prefix sum. @ChrisHegarty looked into it and had a few interesting observations, in particular vectorized prefix sums suddenly become several times faster when bound checks are disabled: https://github.com/jpountz/vectorized-prefix-sum/issues, so there might be room for improvement in the JDK that would allow us to eventually take advantage of vectorization for prefix sums? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org