HUSTERGS commented on PR #14735: URL: https://github.com/apache/lucene/pull/14735#issuecomment-2942655550
Apologies if reopening this PR caused any inconvenience. For what it's worth, I came up with a branchless way to avoid the double IntVector check. What I'm curious about is that if the calculation about `start` is inside of the inner `if` clause, the overall performance actually degraded, I guess it might helps with CPU pipeline if it is outside the `if` . Although the total memory access time is now the same with current implementation, the overall performance seems get a little bit better? I run another round of luceneutil with 20 iteration, the result is as follows. The p-value is still relatively high after 20 iterations :( ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CountAndHighMed 90.83 (2.7%) 90.24 (2.5%) -0.6% ( -5% - 4%) 0.431 FilteredPrefix3 90.30 (2.9%) 89.79 (2.9%) -0.6% ( -6% - 5%) 0.539 Prefix3 96.59 (3.1%) 96.08 (2.9%) -0.5% ( -6% - 5%) 0.580 CountOrHighHigh 62.87 (1.9%) 62.62 (1.6%) -0.4% ( -3% - 3%) 0.471 CountOrMany 6.77 (2.1%) 6.74 (1.7%) -0.3% ( -4% - 3%) 0.587 TermTitleSort 71.58 (3.0%) 71.35 (2.7%) -0.3% ( -5% - 5%) 0.722 CountFilteredOrHighMed 29.60 (2.4%) 29.52 (2.1%) -0.3% ( -4% - 4%) 0.719 CountPhrase 3.28 (2.7%) 3.27 (4.0%) -0.2% ( -6% - 6%) 0.847 CountFilteredOrHighHigh 25.16 (2.3%) 25.12 (1.9%) -0.2% ( -4% - 4%) 0.784 CountFilteredOrMany 6.02 (1.8%) 6.01 (1.4%) -0.1% ( -3% - 3%) 0.787 CountOrHighMed 94.23 (1.9%) 94.12 (1.7%) -0.1% ( -3% - 3%) 0.846 CombinedAndHighHigh 6.22 (3.0%) 6.23 (3.0%) 0.1% ( -5% - 6%) 0.954 CombinedOrHighHigh 6.24 (3.0%) 6.24 (2.9%) 0.1% ( -5% - 6%) 0.881 FilteredOrMany 5.00 (1.6%) 5.01 (2.2%) 0.2% ( -3% - 4%) 0.784 AndHighOrMedMed 17.77 (3.0%) 17.81 (3.1%) 0.2% ( -5% - 6%) 0.816 FilteredAnd3Terms 127.44 (1.9%) 127.77 (1.9%) 0.3% ( -3% - 4%) 0.656 CombinedTerm 13.38 (3.6%) 13.42 (3.6%) 0.3% ( -6% - 7%) 0.796 CountFilteredIntNRQ 22.22 (2.8%) 22.28 (2.1%) 0.3% ( -4% - 5%) 0.698 Phrase 9.77 (2.7%) 9.80 (2.9%) 0.3% ( -5% - 6%) 0.726 Respell 44.26 (1.7%) 44.42 (2.2%) 0.4% ( -3% - 4%) 0.543 AndMedOrHighHigh 20.40 (2.2%) 20.49 (2.2%) 0.4% ( -3% - 4%) 0.529 AndHighMed 67.03 (1.6%) 67.34 (1.7%) 0.5% ( -2% - 3%) 0.376 OrHighHigh 25.03 (3.1%) 25.15 (2.5%) 0.5% ( -5% - 6%) 0.605 IntervalsOrdered 3.00 (1.6%) 3.01 (2.9%) 0.5% ( -3% - 5%) 0.519 AndHighHigh 26.52 (3.6%) 26.65 (3.3%) 0.5% ( -6% - 7%) 0.666 FilteredOrHighHigh 17.62 (1.7%) 17.70 (2.1%) 0.5% ( -3% - 4%) 0.421 IntSet 391.61 (5.2%) 393.68 (4.1%) 0.5% ( -8% - 10%) 0.723 TermDayOfYearSort 373.91 (1.8%) 376.33 (1.1%) 0.6% ( -2% - 3%) 0.175 CountAndHighHigh 60.43 (2.0%) 60.82 (1.6%) 0.6% ( -2% - 4%) 0.263 TermDTSort 179.99 (3.0%) 181.17 (3.1%) 0.7% ( -5% - 6%) 0.500 Wildcard 56.61 (3.4%) 57.01 (2.9%) 0.7% ( -5% - 7%) 0.480 DismaxOrHighMed 64.55 (2.0%) 65.05 (2.3%) 0.8% ( -3% - 5%) 0.257 FilteredAndStopWords 10.20 (4.1%) 10.28 (3.1%) 0.8% ( -6% - 8%) 0.500 CombinedOrHighMed 24.74 (4.3%) 24.93 (3.8%) 0.8% ( -7% - 9%) 0.542 CombinedAndHighMed 24.54 (4.1%) 24.75 (4.0%) 0.9% ( -7% - 9%) 0.506 FilteredAndHighHigh 13.04 (4.6%) 13.15 (3.5%) 0.9% ( -6% - 9%) 0.498 FilteredOrStopWords 10.71 (2.5%) 10.81 (2.5%) 0.9% ( -3% - 6%) 0.239 Or3Terms 75.92 (6.1%) 76.67 (5.0%) 1.0% ( -9% - 12%) 0.575 DismaxOrHighHigh 45.47 (2.7%) 45.93 (2.6%) 1.0% ( -4% - 6%) 0.224 And3Terms 84.47 (5.2%) 85.33 (4.2%) 1.0% ( -7% - 10%) 0.498 SpanNear 3.09 (4.0%) 3.12 (3.8%) 1.0% ( -6% - 9%) 0.411 FilteredAndHighMed 40.91 (4.1%) 41.33 (3.2%) 1.0% ( -6% - 8%) 0.376 FilteredPhrase 12.64 (2.1%) 12.77 (2.2%) 1.0% ( -3% - 5%) 0.132 FilteredIntNRQ 47.83 (2.3%) 48.34 (2.1%) 1.1% ( -3% - 5%) 0.132 OrHighMed 87.14 (2.4%) 88.06 (2.1%) 1.1% ( -3% - 5%) 0.140 IntNRQ 48.10 (2.3%) 48.64 (2.1%) 1.1% ( -3% - 5%) 0.105 AndStopWords 9.02 (9.0%) 9.12 (7.0%) 1.1% ( -13% - 18%) 0.655 FilteredOr3Terms 57.64 (2.4%) 58.30 (2.3%) 1.1% ( -3% - 5%) 0.124 FilteredOrHighMed 52.41 (2.5%) 53.05 (2.7%) 1.2% ( -3% - 6%) 0.140 CountFilteredPhrase 11.63 (3.1%) 11.79 (3.6%) 1.3% ( -5% - 8%) 0.207 CountTerm 6450.51 (4.4%) 6539.66 (4.9%) 1.4% ( -7% - 11%) 0.348 Term100 558.66 (3.8%) 566.48 (3.8%) 1.4% ( -5% - 9%) 0.243 Term10K 557.70 (3.7%) 565.55 (3.8%) 1.4% ( -5% - 9%) 0.236 OrStopWords 9.69 (10.4%) 9.83 (8.3%) 1.5% ( -15% - 22%) 0.625 OrMany 5.41 (6.7%) 5.49 (6.7%) 1.5% ( -11% - 15%) 0.486 TermB1M1P 557.08 (3.9%) 565.84 (3.8%) 1.6% ( -5% - 9%) 0.195 TermB1M 557.51 (4.0%) 566.48 (4.1%) 1.6% ( -6% - 10%) 0.209 OrHighRare 118.38 (5.7%) 120.39 (5.9%) 1.7% ( -9% - 14%) 0.353 Term 557.50 (3.6%) 567.31 (4.0%) 1.8% ( -5% - 9%) 0.145 Fuzzy1 50.97 (3.5%) 51.88 (3.9%) 1.8% ( -5% - 9%) 0.132 Fuzzy2 45.27 (3.4%) 46.09 (4.1%) 1.8% ( -5% - 9%) 0.127 DismaxTerm 604.34 (3.4%) 615.37 (3.8%) 1.8% ( -5% - 9%) 0.112 Term1M 556.69 (3.9%) 566.88 (3.9%) 1.8% ( -5% - 10%) 0.141 FilteredTerm 85.24 (3.1%) 86.88 (3.3%) 1.9% ( -4% - 8%) 0.055 TermMonthSort 2361.35 (3.8%) 2409.08 (4.8%) 2.0% ( -6% - 11%) 0.139 SloppyPhrase 1.47 (5.9%) 1.50 (3.8%) 2.3% ( -6% - 12%) 0.138 FilteredOr2Terms2StopWords 67.14 (5.1%) 69.09 (5.5%) 2.9% ( -7% - 14%) 0.082 FilteredAnd2Terms2StopWords 72.35 (6.9%) 74.70 (6.8%) 3.2% ( -9% - 18%) 0.134 Or2Terms2StopWords 73.04 (8.4%) 75.51 (7.7%) 3.4% ( -11% - 21%) 0.184 PKLookup 181.84 (6.7%) 188.05 (8.1%) 3.4% ( -10% - 19%) 0.146 And2Terms2StopWords 74.34 (8.5%) 77.05 (8.2%) 3.6% ( -12% - 22%) 0.168 CPU merged search profile for my_modified_version: JFR aggregation command: /home/gesong.samuel/.sdkman/candidates/java/current/bin/java -server -Xms2g -Xmx2g --add-modules jdk.incubator.vector -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC -cp /data00/home/gesong.samuel/lucene_candidate/build-tools/build-infra/build/classes/java/main -Dtests.profile.mode=cpu -Dtests.profile.stacksize=1 -Dtests.profile.count=30 org.apache.lucene.gradle.ProfileResults /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-12.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-6.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-3.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-13.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-11.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-7.jfr /data00/home/gesong.samuel/logs/bench-search-basel ine_vs_patch-my_modified_version-17.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-9.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-18.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-15.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-16.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-8.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-14.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-1.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-19.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-10.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-2.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patc h-my_modified_version-5.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-4.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-0.jfr Took 21.37 seconds WARNING: Using incubator modules: jdk.incubator.vector PROFILE SUMMARY from 6331976 events (total: 6M) tests.profile.mode=cpu tests.profile.count=30 tests.profile.stacksize=1 tests.profile.linenumbers=false PERCENT CPU SAMPLES STACK 10.35% 655545 jdk.incubator.vector.Int512Vector$Int512Mask#trueCount() [Inlined code] 5.03% 318374 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#nextPosition() [Inlined code] 4.13% 261683 org.apache.lucene.store.MemorySegmentIndexInput#readByte() [Inlined code] 3.20% 202422 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() [Inlined code] 2.40% 152033 jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw() [Inlined code] 2.39% 151128 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact() [Inlined code] 2.22% 140677 jdk.incubator.vector.IntVector#intoArray() [Inlined code] 2.21% 140030 perf.RandomQuery$1$1#nextDoc() [Inlined code] 2.20% 139152 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader#sumOverRange() [Inlined code] 2.06% 130241 org.apache.lucene.search.ScorerUtil#filterCompetitiveHits() [Inlined code] 1.72% 108865 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() [Inlined code] 1.38% 87623 org.apache.lucene.search.PhraseScorer$1#matches() [JIT compiled code] 1.28% 81057 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#advance() [Inlined code] 1.23% 78078 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#freq() [Inlined code] 1.23% 77585 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#refillFullBlock() [JIT compiled code] 1.19% 75624 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval() [JIT compiled code] 1.08% 68212 org.apache.lucene.search.DisiPriorityQueueN#downHeap() [Inlined code] 1.07% 67493 org.apache.lucene.internal.vectorization.PostingDecodingUtil#splitInts() [JIT compiled code] 0.99% 62769 org.apache.lucene.codecs.lucene103.ForUtil#expand8() [JIT compiled code] 0.98% 62174 org.apache.lucene.util.PriorityQueue#add() [Inlined code] 0.95% 60469 org.apache.lucene.util.PriorityQueue#upHeap() [Inlined code] 0.90% 56682 org.apache.lucene.search.SloppyPhraseMatcher#maxFreq() [Inlined code] 0.89% 56061 org.apache.lucene.search.ConjunctionDISI#doNext() [Inlined code] 0.88% 55846 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#advance() [JIT compiled code] 0.85% 53565 org.apache.lucene.search.MaxScoreBulkScorer#scoreInnerWindowWithFilter() [JIT compiled code] 0.82% 51888 org.apache.lucene.search.TermScorer#nextDocsAndScores() [Inlined code] 0.82% 51639 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#accumulatePendingPositions() [JIT compiled code] 0.81% 51117 org.apache.lucene.search.SloppyPhraseMatcher#nextMatch() [JIT compiled code] 0.78% 49592 org.apache.lucene.util.FixedBitSet#nextSetBitInRange() [Inlined code] 0.73% 46031 org.apache.lucene.search.ExactPhraseMatcher#advancePosition() [Inlined code] CPU merged search profile for baseline: JFR aggregation command: /home/gesong.samuel/.sdkman/candidates/java/current/bin/java -server -Xms2g -Xmx2g --add-modules jdk.incubator.vector -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC -cp /data00/home/gesong.samuel/lucene_baseline/build-tools/build-infra/build/classes/java/main -Dtests.profile.mode=cpu -Dtests.profile.stacksize=1 -Dtests.profile.count=30 org.apache.lucene.gradle.ProfileResults /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-16.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-17.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-9.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-14.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-13.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-6.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-0.jfr /data00/home/gesong.samuel/logs/bench- search-baseline_vs_patch-baseline-2.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-7.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-3.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-11.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-1.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-4.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-12.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-5.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-15.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-18.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-8.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-19.jfr /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-10.jfr Took 20.73 seconds WARNING: Using incubator modules: jdk.incubator.vector PROFILE SUMMARY from 6372866 events (total: 6M) tests.profile.mode=cpu tests.profile.count=30 tests.profile.stacksize=1 tests.profile.linenumbers=false PERCENT CPU SAMPLES STACK 10.86% 692132 jdk.incubator.vector.Int512Vector$Int512Mask#trueCount() [Inlined code] 5.07% 323070 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#nextPosition() [Inlined code] 4.39% 279580 org.apache.lucene.store.MemorySegmentIndexInput#readByte() [Inlined code] 3.14% 199838 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() [Inlined code] 2.40% 153045 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact() [Inlined code] 2.35% 149991 jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw() [Inlined code] 2.29% 145718 perf.RandomQuery$1$1#nextDoc() [Inlined code] 2.26% 143935 jdk.incubator.vector.IntVector#intoArray() [Inlined code] 2.19% 139634 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader#sumOverRange() [Inlined code] 1.97% 125374 org.apache.lucene.search.ScorerUtil#filterCompetitiveHits() [Inlined code] 1.65% 105165 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() [Inlined code] 1.34% 85677 org.apache.lucene.search.PhraseScorer$1#matches() [JIT compiled code] 1.29% 81991 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#freq() [Inlined code] 1.20% 76168 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#refillFullBlock() [JIT compiled code] 1.19% 75955 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval() [JIT compiled code] 1.11% 70842 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#advance() [Inlined code] 1.00% 63821 org.apache.lucene.internal.vectorization.PostingDecodingUtil#splitInts() [JIT compiled code] 0.96% 61418 org.apache.lucene.search.DisiPriorityQueueN#downHeap() [Inlined code] 0.95% 60790 org.apache.lucene.codecs.lucene103.ForUtil#expand8() [JIT compiled code] 0.91% 57910 org.apache.lucene.util.PriorityQueue#upHeap() [Inlined code] 0.90% 57656 org.apache.lucene.util.PriorityQueue#add() [Inlined code] 0.89% 56542 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#advance() [JIT compiled code] 0.88% 56218 org.apache.lucene.search.SloppyPhraseMatcher#nextMatch() [JIT compiled code] 0.88% 56032 org.apache.lucene.search.ConjunctionDISI#doNext() [Inlined code] 0.84% 53435 org.apache.lucene.search.SloppyPhraseMatcher#maxFreq() [Inlined code] 0.80% 51269 org.apache.lucene.search.TermScorer#nextDocsAndScores() [Inlined code] 0.78% 49990 org.apache.lucene.search.ExactPhraseMatcher#advancePosition() [Inlined code] 0.78% 49632 org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#accumulatePendingPositions() [JIT compiled code] 0.75% 47506 org.apache.lucene.util.FixedBitSet#nextSetBitInRange() [Inlined code] 0.74% 47444 org.apache.lucene.search.MaxScoreBulkScorer#scoreInnerWindowWithFilter() [JIT compiled code] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org