rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787409152
> I see the scalar code vectorize, but not optimally for the target CPU - e.g. `vfmadd231ss %xmm9,%xmm10,%xmm4` on my Rocket Lake. Where as, the vector API compilation emits instructions that use wider registers, e.g. `vfmadd231ps %zmm6,%zmm2,%zmm10`. My primitive (and possibly out of date) understanding is that the register allocator will not use the wider registers for these kinda auto-vectorization scenarios - the advise is to use the Vector API! That isn't vectorization. That is using xmm registers for floating point (normal) and using scalar 32-bit version of VFMADD. This is always how scalar math looks here. Vectorization isn't happening. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org