jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2652267868
> #current bpv=24 gets vectorized on the shift loop, but not for the remainder loop. This is an interesting observation. I wonder if a small refactoring could help it get auto-vectorized? E.g. what if we applied the `0xFF` mask to `scratch` in the shift loop rather than the remainder loop? Or if we split the remainder loop into 3 loops, one for each 8 bits that get contributed to the value? Sorry for pushing, but if we could get auto-vectorization to do the right thing, then this would automatically benefit all users, not only those who enable the vector module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org