rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751938382
ok i reverted the 256-bit changes from here, and from the vectorbench, but
kept the 128 bit ones for ppl to test on macs. Now this issue does the opposite
of what it says, i will edit it..
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934622
thanks for running. I will just revert it then and get folks to test arm
changes. i don't want to hurt avx 512...
--
This is an automated message from the Apache Git Service.
To respond
gf2121 commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934272
FYI I run the benchmark on [latest benchmark
commit](https://github.com/rmuir/vectorbench/commit/ef7e089a75a883d809145d2686e6a4dc1915c106)
with a linux-x86-64 sever that AVX-512 supported
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751926396
I did manage to get a little bit more out of the arm chip. I will look at
the other 2 functions there too...
```
Benchmark (size) Mode Cnt Score
rmuir opened a new pull request, #12632:
URL: https://github.com/apache/lucene/pull/12632
We can get these functions closer to optimal by just directly converting to
32-bits + `vpmulld`.
See https://stackoverflow.com/a/69848057 for the motivation.
You can reproduce my results