Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-31 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2450044413 If you check out data at https://github.com/apache/lucene/pull/13692#issuecomment-2324658146, `AndHighHigh` and `AndHighMed` tend to advance a bit further than `CountAndHighHigh` and `C

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-31 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2449813210 Nightly benchmarks just picked up the change with a mix of speedups and slowdowns: https://benchmarks.mikemccandless.com/2024.10.30.18.12.23.html. Here are the main ones I'm seeing:

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-30 Thread via GitHub
jpountz merged PR #13958: URL: https://github.com/apache/lucene/pull/13958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-29 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2443591275 I plan on merging this change soon, and looking into moving postings back to int[] arrays next to hopefully get benefits from having 2x more lanes that can be compared at once. -- Thi

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-28 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2442236520 Here's wikimediumall on a c7i.2xlarge instance that supports AVX512: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-28 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2441837295 Here's a `luceneutil`/`wikibigall` run on the latest version of the code on my Linux desktop: ``` TaskQPS baseline StdDevQPS my_modified_versio

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-27 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2440178064 I did more digging: vectorization actually worked on my Mac! So my best guess is that I got a ~20% slowdown because I only have 2 lanes on it, so the `trueCount != LONG_SPECIES.length()`

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438973598 maybe its a bug that it doesnt work on your mac either. because elsewhere they have code that looks like it is supposed to be doing this stuff: https://github.com/openjdk/jdk/blob/f1a9a8d2

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438947715 For these uses of vectormask you are ok with AVX2 (so just use existing FAST_INTEGER_VECTORS check): https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1597-L160

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438944785 https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L280-L283 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438925486 you are using VectorMask, only use this where implemented in HW (AVX-512 and ARM SVE). -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438919587 I ran this PR on my Mac laptop (M3), where this gives a massive slowdown, I imagine because some of the vector operations I'm using are emulated. I need to find what to check against in

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438911637 And I seem to be getting a better speedup by using `trueCount()` instead of `firstTrue()`: ``` TaskQPS baseline StdDevQPS my_modified_version

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438737799 Specializing `ImpactsDISI#nextDoc()` helped get rid of the slowdown: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

[PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
jpountz opened a new pull request, #13958: URL: https://github.com/apache/lucene/pull/13958 PR #13692 tried to speed up advancing by using branchless binary search, but while this yielded a speedup on my machine, this yielded a slowdown on nightly benchmarks. This PR tries a differen