rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1788347778
Uwe i realize i may be short with my responses, I ask: 0. get your espresso before continuing. 1. please take the time to look at benchmark results in depth. **make** sure you see the regression in dot product when using FMA on your cpu. 2. make sure you see what differences are noise and what aren't. look at stddev and run the benchmark multiple times if needed. it is extremely important. 3. look at assembly (the help/ instructions show you this) so you can see what is happening. 4. understand the dot product is important for measurements/comparisons. on arm the vector impl is 3.95x, that's basically optimal. on x86 it is not even close. i can't really judge where we are at when the scalar baseline isn't maxing out the cpu. it is important. the other functions are not: they are toys and maybe we can optimize them separately. they aren't bottlenecked by the hardware. It is important to me, to try to take a systematic approach to cleaning this crap up. Fixing the scalar baseline helps a lot. Next steps for me are to take this PR and work on 128-bit x86 then work my way up. ARM is good to go already with this PR, no work needed there, just don't regress it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org