Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-16 Thread via GitHub
jpountz commented on code in PR #14203: URL: https://github.com/apache/lucene/pull/14203#discussion_r1997535617 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PointsWriter.java: ## @@ -105,15 +107,22 @@ public Lucene90PointsWriter( } } + public Luce

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-15 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2727214320 > Sorry for making it hard for you to move this PR forward, I was a bit annoyed that we needed something complicated to speed things up, I like the simplicity of specializedDecodeMaskInRe

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-15 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2726996663 Again, thanks a lot for running benchmarks. > I can refactor the code to the specialized decoding if it makes sense to you That would be great, thank you. Sorry for making i

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-15 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2723963174 @jpountz Hi, do you have any idea how should we move forward on this optimization? several thoughts: * We can add another step32 for the hybrid-step decoding, which makes the code

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-15 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2726739699 On the AVX-512 machine: * Specialized read does not vectorize the remainder loop, it seems the complier failed to inline it. * Specialized decode vectorizes the remainder loop.

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-14 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2726015514 Thanks for running benchmarks. So it looks like the JVM doesn't think these shorter loops (with step 128) are worth unrolling? This makes me wonder how something like that performs on y

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-14 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2725390772 > There must be something that happens with this 512 step that doesn't happen otherwise such as using different instructions, loop unrolling, better CPU pipelining or something else.

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-14 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2724038977 I have some small concerns: - The fact that the 512 step is tied to the number of points per leaf, though it's not a big deal at all, postings are similar: their encoding logic is sp

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-04 Thread via GitHub
github-actions[bot] commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2699339952 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-18 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2664968205 Confused +1 ... but the comparison of step512(baseline) and step32(candidate): ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-17 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2663774846 Thanks for running benchmarks. I'm confused as to why running inner loops of size 512 would be to much better than inner loops of size 32. This doesn't feel right? Does luceneutil also r

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659556195 [perf_asm.log](https://github.com/user-attachments/files/18801016/perf_asm.log) Profile suggests that loops get vectorized. -- This is an automated message from the Apache Git Se

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659531719 Results on my machines are a bit disappointing ``` java -version openjdk version "23.0.2" 2025-01-21 OpenJDK Runtime Environment (build 23.0.2+7-58) OpenJDK 64-Bit Server VM

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659510128 Yes, exactly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659505128 Thanks for feedback! And sorry for my poor english.. Do you mean something like this by `single batch size of 16 of 32` ? ``` private static void readDelta16(IndexInput in, i

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659455382 Thanks for iterating and running benchmarks. I played with the micro-benchmark and I get almost the same result if I use a single batch size of 16 of 32 (AMD Ryzen with AVX2 but no AVX-5

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-13 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2658082076 Comparison of VectorAPI(Baseline) and InnerLoop(Candidate) ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-13 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2657334178 > is my understanding correct that it performs even better? Yeah! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-13 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2657293043 These results look even better than the results that you had previously reported for the vector API, is my understanding correct that it performs even better? -- This is an automated

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-12 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2653948712 I refactor code to inner-loop. Result on wikimediumall AVX512 ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct d

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-12 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2653001052 Inner loop performance get better on the newest commit. ``` Mac M2 Benchmark(bpv) (countVariable) Mode CntScore Error Units BKDCodec

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-11 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2652803442 > applied the 0xFF mask to scratch in the shift loop This helps generate `vpand` in assembly, but not help performance too much. > Sorry for pushing Not at all, it's in

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-11 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2652267868 > #current bpv=24 gets vectorized on the shift loop, but not for the remainder loop. This is an interesting observation. I wonder if a small refactoring could help it get auto-vec

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-11 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2651208564 Thanks for feedback! I implement the fixed-size inner loop and print out assembly for all. [perf_asm.log](https://github.com/user-attachments/files/18752147/perf_asm.log) * When pr

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-10 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2649210893 Thanks for looking into it. Were you able to confirm that the difference with the variable count is indeed that auto-vectorization not getting enabled as opposed to something else such a