jpountz commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2450044413
If you check out data at
https://github.com/apache/lucene/pull/13692#issuecomment-2324658146,
`AndHighHigh` and `AndHighMed` tend to advance a bit further than
`CountAndHighHigh` and `C
jpountz commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2449813210
Nightly benchmarks just picked up the change with a mix of speedups and
slowdowns: https://benchmarks.mikemccandless.com/2024.10.30.18.12.23.html. Here
are the main ones I'm seeing:
jpountz merged PR #13958:
URL: https://github.com/apache/lucene/pull/13958
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2443591275
I plan on merging this change soon, and looking into moving postings back to
int[] arrays next to hopefully get benefits from having 2x more lanes that can
be compared at once.
--
Thi
jpountz commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2442236520
Here's wikimediumall on a c7i.2xlarge instance that supports AVX512:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev
jpountz commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2441837295
Here's a `luceneutil`/`wikibigall` run on the latest version of the code on
my Linux desktop:
```
TaskQPS baseline StdDevQPS
my_modified_versio
jpountz commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2440178064
I did more digging: vectorization actually worked on my Mac! So my best
guess is that I got a ~20% slowdown because I only have 2 lanes on it, so the
`trueCount != LONG_SPECIES.length()`
rmuir commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438973598
maybe its a bug that it doesnt work on your mac either. because elsewhere
they have code that looks like it is supposed to be doing this stuff:
https://github.com/openjdk/jdk/blob/f1a9a8d2
rmuir commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438947715
For these uses of vectormask you are ok with AVX2 (so just use existing
FAST_INTEGER_VECTORS check):
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1597-L160
rmuir commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438944785
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L280-L283
--
This is an automated message from the Apache Git Service.
To respond to the message, ple
rmuir commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438925486
you are using VectorMask, only use this where implemented in HW (AVX-512 and
ARM SVE).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
jpountz commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438919587
I ran this PR on my Mac laptop (M3), where this gives a massive slowdown, I
imagine because some of the vector operations I'm using are emulated. I need to
find what to check against in
jpountz commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438911637
And I seem to be getting a better speedup by using `trueCount()` instead of
`firstTrue()`:
```
TaskQPS baseline StdDevQPS
my_modified_version
jpountz commented on PR #13958:
URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438737799
Specializing `ImpactsDISI#nextDoc()` helped get rid of the slowdown:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev
jpountz opened a new pull request, #13958:
URL: https://github.com/apache/lucene/pull/13958
PR #13692 tried to speed up advancing by using branchless binary search, but
while this yielded a speedup on my machine, this yielded a slowdown on nightly
benchmarks.
This PR tries a differen
15 matches
Mail list logo