Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

via GitHub Tue, 31 Oct 2023 08:24:24 -0700


ChrisHegarty commented on PR #12737:
URL: https://github.com/apache/lucene/pull/12737#issuecomment-1787445189


   > > I see the scalar code vectorize, but not optimally for the target CPU - 
e.g. `vfmadd231ss %xmm9,%xmm10,%xmm4` on my Rocket Lake. Where as, the vector 
API compilation emits instructions that use wider registers, e.g. `vfmadd231ps 
%zmm6,%zmm2,%zmm10`. My primitive (and possibly out of date) understanding is 
that the register allocator will not use the wider registers for these kinda 
auto-vectorization scenarios - the advise is to use the Vector API!
   > 
   > That isn't vectorization. That is using xmm registers for floating point 
(normal) and using scalar 32-bit version of VFMADD. This is always how scalar 
math looks here.
   > 
   > Vectorization isn't happening.
   
   D'oh! I goofed on this, by just skimming the disassembly and not looking 
more closely at the loads, etc. You're right. I don't see this vectorize either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

Reply via email to