rmuir commented on PR #12737:
URL: https://github.com/apache/lucene/pull/12737#issuecomment-1788347778

   Uwe i realize i may be short with my responses, I ask:
   0. get your espresso before continuing.
   1. please take the time to look at benchmark results in depth. **make** sure 
you see the regression in dot product when using FMA on your cpu. 
   2. make sure you see what differences are noise and what aren't. look at 
stddev and run the benchmark multiple times if needed. it is extremely 
important.
   3. look at assembly (the help/ instructions show you this) so you can see 
what is happening.
   4. understand the dot product is important for measurements/comparisons. on 
arm the vector impl is 3.95x, that's basically optimal. on x86 it is not even 
close. i can't really judge where we are at when the scalar baseline isn't 
maxing out the cpu. it is important. the other functions are not: they are toys 
and maybe we can optimize them separately. they aren't bottlenecked by the 
hardware.
   
   It is important to me, to try to take a systematic approach to cleaning this 
crap up. Fixing the scalar baseline helps a lot. Next steps for me are to take 
this PR and work on 128-bit x86 then work my way up. ARM is good to go already 
with this PR, no work needed there, just don't regress it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to