Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751938382 ok i reverted the 256-bit changes from here, and from the vectorbench, but kept the 128 bit ones for ppl to test on macs. Now this issue does the opposite of what it says, i will edit it..

Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934622 thanks for running. I will just revert it then and get folks to test arm changes. i don't want to hurt avx 512... -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
gf2121 commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934272 FYI I run the benchmark on [latest benchmark commit](https://github.com/rmuir/vectorbench/commit/ef7e089a75a883d809145d2686e6a4dc1915c106) with a linux-x86-64 sever that AVX-512 supported

Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751926396 I did manage to get a little bit more out of the arm chip. I will look at the other 2 functions there too... ``` Benchmark (size) Mode Cnt Score

[PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
rmuir opened a new pull request, #12632: URL: https://github.com/apache/lucene/pull/12632 We can get these functions closer to optimal by just directly converting to 32-bits + `vpmulld`. See https://stackoverflow.com/a/69848057 for the motivation. You can reproduce my results