Re: [PR] Speed up findNextGEQ by aggresive stepping [lucene]

via GitHub Wed, 04 Jun 2025 20:55:00 -0700


HUSTERGS commented on PR #14735:
URL: https://github.com/apache/lucene/pull/14735#issuecomment-2942655550


   Apologies if reopening this PR caused any inconvenience.
   
   For what it's worth, I came up with a branchless way to avoid the double 
IntVector check. What I'm curious about is that if the calculation about 
`start` is inside of the inner `if` clause, the overall performance actually 
degraded, I guess it might helps with CPU pipeline if it is outside the `if` . 
Although the total memory access time is now the same with current 
implementation, the overall performance seems get a little bit better?
   
   I run another round of luceneutil with 20 iteration, the result is as 
follows. The p-value is still relatively high after 20 iterations   :(
   ```
   TaskQPS baseline      StdDevQPS my_modified_version      StdDev              
  Pct diff p-value
                    CountAndHighMed       90.83      (2.7%)       90.24      
(2.5%)   -0.6% (  -5% -    4%) 0.431
                    FilteredPrefix3       90.30      (2.9%)       89.79      
(2.9%)   -0.6% (  -6% -    5%) 0.539
                            Prefix3       96.59      (3.1%)       96.08      
(2.9%)   -0.5% (  -6% -    5%) 0.580
                    CountOrHighHigh       62.87      (1.9%)       62.62      
(1.6%)   -0.4% (  -3% -    3%) 0.471
                        CountOrMany        6.77      (2.1%)        6.74      
(1.7%)   -0.3% (  -4% -    3%) 0.587
                      TermTitleSort       71.58      (3.0%)       71.35      
(2.7%)   -0.3% (  -5% -    5%) 0.722
             CountFilteredOrHighMed       29.60      (2.4%)       29.52      
(2.1%)   -0.3% (  -4% -    4%) 0.719
                        CountPhrase        3.28      (2.7%)        3.27      
(4.0%)   -0.2% (  -6% -    6%) 0.847
            CountFilteredOrHighHigh       25.16      (2.3%)       25.12      
(1.9%)   -0.2% (  -4% -    4%) 0.784
                CountFilteredOrMany        6.02      (1.8%)        6.01      
(1.4%)   -0.1% (  -3% -    3%) 0.787
                     CountOrHighMed       94.23      (1.9%)       94.12      
(1.7%)   -0.1% (  -3% -    3%) 0.846
                CombinedAndHighHigh        6.22      (3.0%)        6.23      
(3.0%)    0.1% (  -5% -    6%) 0.954
                 CombinedOrHighHigh        6.24      (3.0%)        6.24      
(2.9%)    0.1% (  -5% -    6%) 0.881
                     FilteredOrMany        5.00      (1.6%)        5.01      
(2.2%)    0.2% (  -3% -    4%) 0.784
                    AndHighOrMedMed       17.77      (3.0%)       17.81      
(3.1%)    0.2% (  -5% -    6%) 0.816
                  FilteredAnd3Terms      127.44      (1.9%)      127.77      
(1.9%)    0.3% (  -3% -    4%) 0.656
                       CombinedTerm       13.38      (3.6%)       13.42      
(3.6%)    0.3% (  -6% -    7%) 0.796
                CountFilteredIntNRQ       22.22      (2.8%)       22.28      
(2.1%)    0.3% (  -4% -    5%) 0.698
                             Phrase        9.77      (2.7%)        9.80      
(2.9%)    0.3% (  -5% -    6%) 0.726
                            Respell       44.26      (1.7%)       44.42      
(2.2%)    0.4% (  -3% -    4%) 0.543
                   AndMedOrHighHigh       20.40      (2.2%)       20.49      
(2.2%)    0.4% (  -3% -    4%) 0.529
                         AndHighMed       67.03      (1.6%)       67.34      
(1.7%)    0.5% (  -2% -    3%) 0.376
                         OrHighHigh       25.03      (3.1%)       25.15      
(2.5%)    0.5% (  -5% -    6%) 0.605
                   IntervalsOrdered        3.00      (1.6%)        3.01      
(2.9%)    0.5% (  -3% -    5%) 0.519
                        AndHighHigh       26.52      (3.6%)       26.65      
(3.3%)    0.5% (  -6% -    7%) 0.666
                 FilteredOrHighHigh       17.62      (1.7%)       17.70      
(2.1%)    0.5% (  -3% -    4%) 0.421
                             IntSet      391.61      (5.2%)      393.68      
(4.1%)    0.5% (  -8% -   10%) 0.723
                  TermDayOfYearSort      373.91      (1.8%)      376.33      
(1.1%)    0.6% (  -2% -    3%) 0.175
                   CountAndHighHigh       60.43      (2.0%)       60.82      
(1.6%)    0.6% (  -2% -    4%) 0.263
                         TermDTSort      179.99      (3.0%)      181.17      
(3.1%)    0.7% (  -5% -    6%) 0.500
                           Wildcard       56.61      (3.4%)       57.01      
(2.9%)    0.7% (  -5% -    7%) 0.480
                    DismaxOrHighMed       64.55      (2.0%)       65.05      
(2.3%)    0.8% (  -3% -    5%) 0.257
               FilteredAndStopWords       10.20      (4.1%)       10.28      
(3.1%)    0.8% (  -6% -    8%) 0.500
                  CombinedOrHighMed       24.74      (4.3%)       24.93      
(3.8%)    0.8% (  -7% -    9%) 0.542
                 CombinedAndHighMed       24.54      (4.1%)       24.75      
(4.0%)    0.9% (  -7% -    9%) 0.506
                FilteredAndHighHigh       13.04      (4.6%)       13.15      
(3.5%)    0.9% (  -6% -    9%) 0.498
                FilteredOrStopWords       10.71      (2.5%)       10.81      
(2.5%)    0.9% (  -3% -    6%) 0.239
                           Or3Terms       75.92      (6.1%)       76.67      
(5.0%)    1.0% (  -9% -   12%) 0.575
                   DismaxOrHighHigh       45.47      (2.7%)       45.93      
(2.6%)    1.0% (  -4% -    6%) 0.224
                          And3Terms       84.47      (5.2%)       85.33      
(4.2%)    1.0% (  -7% -   10%) 0.498
                           SpanNear        3.09      (4.0%)        3.12      
(3.8%)    1.0% (  -6% -    9%) 0.411
                 FilteredAndHighMed       40.91      (4.1%)       41.33      
(3.2%)    1.0% (  -6% -    8%) 0.376
                     FilteredPhrase       12.64      (2.1%)       12.77      
(2.2%)    1.0% (  -3% -    5%) 0.132
                     FilteredIntNRQ       47.83      (2.3%)       48.34      
(2.1%)    1.1% (  -3% -    5%) 0.132
                          OrHighMed       87.14      (2.4%)       88.06      
(2.1%)    1.1% (  -3% -    5%) 0.140
                             IntNRQ       48.10      (2.3%)       48.64      
(2.1%)    1.1% (  -3% -    5%) 0.105
                       AndStopWords        9.02      (9.0%)        9.12      
(7.0%)    1.1% ( -13% -   18%) 0.655
                   FilteredOr3Terms       57.64      (2.4%)       58.30      
(2.3%)    1.1% (  -3% -    5%) 0.124
                  FilteredOrHighMed       52.41      (2.5%)       53.05      
(2.7%)    1.2% (  -3% -    6%) 0.140
                CountFilteredPhrase       11.63      (3.1%)       11.79      
(3.6%)    1.3% (  -5% -    8%) 0.207
                          CountTerm     6450.51      (4.4%)     6539.66      
(4.9%)    1.4% (  -7% -   11%) 0.348
                            Term100      558.66      (3.8%)      566.48      
(3.8%)    1.4% (  -5% -    9%) 0.243
                            Term10K      557.70      (3.7%)      565.55      
(3.8%)    1.4% (  -5% -    9%) 0.236
                        OrStopWords        9.69     (10.4%)        9.83      
(8.3%)    1.5% ( -15% -   22%) 0.625
                             OrMany        5.41      (6.7%)        5.49      
(6.7%)    1.5% ( -11% -   15%) 0.486
                          TermB1M1P      557.08      (3.9%)      565.84      
(3.8%)    1.6% (  -5% -    9%) 0.195
                            TermB1M      557.51      (4.0%)      566.48      
(4.1%)    1.6% (  -6% -   10%) 0.209
                         OrHighRare      118.38      (5.7%)      120.39      
(5.9%)    1.7% (  -9% -   14%) 0.353
                               Term      557.50      (3.6%)      567.31      
(4.0%)    1.8% (  -5% -    9%) 0.145
                             Fuzzy1       50.97      (3.5%)       51.88      
(3.9%)    1.8% (  -5% -    9%) 0.132
                             Fuzzy2       45.27      (3.4%)       46.09      
(4.1%)    1.8% (  -5% -    9%) 0.127
                         DismaxTerm      604.34      (3.4%)      615.37      
(3.8%)    1.8% (  -5% -    9%) 0.112
                             Term1M      556.69      (3.9%)      566.88      
(3.9%)    1.8% (  -5% -   10%) 0.141
                       FilteredTerm       85.24      (3.1%)       86.88      
(3.3%)    1.9% (  -4% -    8%) 0.055
                      TermMonthSort     2361.35      (3.8%)     2409.08      
(4.8%)    2.0% (  -6% -   11%) 0.139
                       SloppyPhrase        1.47      (5.9%)        1.50      
(3.8%)    2.3% (  -6% -   12%) 0.138
         FilteredOr2Terms2StopWords       67.14      (5.1%)       69.09      
(5.5%)    2.9% (  -7% -   14%) 0.082
        FilteredAnd2Terms2StopWords       72.35      (6.9%)       74.70      
(6.8%)    3.2% (  -9% -   18%) 0.134
                 Or2Terms2StopWords       73.04      (8.4%)       75.51      
(7.7%)    3.4% ( -11% -   21%) 0.184
                           PKLookup      181.84      (6.7%)      188.05      
(8.1%)    3.4% ( -10% -   19%) 0.146
                And2Terms2StopWords       74.34      (8.5%)       77.05      
(8.2%)    3.6% ( -12% -   22%) 0.168
   
   CPU merged search profile for my_modified_version:
   JFR aggregation command: 
/home/gesong.samuel/.sdkman/candidates/java/current/bin/java -server -Xms2g 
-Xmx2g --add-modules jdk.incubator.vector -XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseParallelGC -cp 
/data00/home/gesong.samuel/lucene_candidate/build-tools/build-infra/build/classes/java/main
 -Dtests.profile.mode=cpu -Dtests.profile.stacksize=1 -Dtests.profile.count=30 
org.apache.lucene.gradle.ProfileResults 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-12.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-6.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-3.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-13.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-11.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-7.jfr
 /data00/home/gesong.samuel/logs/bench-search-basel
 ine_vs_patch-my_modified_version-17.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-9.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-18.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-15.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-16.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-8.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-14.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-1.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-19.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-10.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-2.jfr
 /data00/home/gesong.samuel/logs/bench-search-baseline_vs_patc
 h-my_modified_version-5.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-4.jfr
 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-my_modified_version-0.jfr
   Took 21.37 seconds
   WARNING: Using incubator modules: jdk.incubator.vector
   PROFILE SUMMARY from 6331976 events (total: 6M)
     tests.profile.mode=cpu
     tests.profile.count=30
     tests.profile.stacksize=1
     tests.profile.linenumbers=false
   PERCENT       CPU SAMPLES   STACK
   10.35%        655545        
jdk.incubator.vector.Int512Vector$Int512Mask#trueCount() [Inlined code]
   5.03%         318374        
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#nextPosition()
 [Inlined code]
   4.13%         261683        
org.apache.lucene.store.MemorySegmentIndexInput#readByte() [Inlined code]
   3.20%         202422        
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 
[Inlined code]
   2.40%         152033        
jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw() [Inlined code]
   2.39%         151128        
org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
 [Inlined code]
   2.22%         140677        jdk.incubator.vector.IntVector#intoArray() 
[Inlined code]
   2.21%         140030        perf.RandomQuery$1$1#nextDoc() [Inlined code]
   2.20%         139152        
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader#sumOverRange() 
[Inlined code]
   2.06%         130241        
org.apache.lucene.search.ScorerUtil#filterCompetitiveHits() [Inlined code]
   1.72%         108865        
org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() [Inlined 
code]
   1.38%         87623         
org.apache.lucene.search.PhraseScorer$1#matches() [JIT compiled code]
   1.28%         81057         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#advance()
 [Inlined code]
   1.23%         78078         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#freq()
 [Inlined code]
   1.23%         77585         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#refillFullBlock()
 [JIT compiled code]
   1.19%         75624         
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
 [JIT compiled code]
   1.08%         68212         
org.apache.lucene.search.DisiPriorityQueueN#downHeap() [Inlined code]
   1.07%         67493         
org.apache.lucene.internal.vectorization.PostingDecodingUtil#splitInts() [JIT 
compiled code]
   0.99%         62769         
org.apache.lucene.codecs.lucene103.ForUtil#expand8() [JIT compiled code]
   0.98%         62174         org.apache.lucene.util.PriorityQueue#add() 
[Inlined code]
   0.95%         60469         org.apache.lucene.util.PriorityQueue#upHeap() 
[Inlined code]
   0.90%         56682         
org.apache.lucene.search.SloppyPhraseMatcher#maxFreq() [Inlined code]
   0.89%         56061         
org.apache.lucene.search.ConjunctionDISI#doNext() [Inlined code]
   0.88%         55846         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#advance()
 [JIT compiled code]
   0.85%         53565         
org.apache.lucene.search.MaxScoreBulkScorer#scoreInnerWindowWithFilter() [JIT 
compiled code]
   0.82%         51888         
org.apache.lucene.search.TermScorer#nextDocsAndScores() [Inlined code]
   0.82%         51639         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#accumulatePendingPositions()
 [JIT compiled code]
   0.81%         51117         
org.apache.lucene.search.SloppyPhraseMatcher#nextMatch() [JIT compiled code]
   0.78%         49592         
org.apache.lucene.util.FixedBitSet#nextSetBitInRange() [Inlined code]
   0.73%         46031         
org.apache.lucene.search.ExactPhraseMatcher#advancePosition() [Inlined code]
   
   
   CPU merged search profile for baseline:
   JFR aggregation command: 
/home/gesong.samuel/.sdkman/candidates/java/current/bin/java -server -Xms2g 
-Xmx2g --add-modules jdk.incubator.vector -XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseParallelGC -cp 
/data00/home/gesong.samuel/lucene_baseline/build-tools/build-infra/build/classes/java/main
 -Dtests.profile.mode=cpu -Dtests.profile.stacksize=1 -Dtests.profile.count=30 
org.apache.lucene.gradle.ProfileResults 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-16.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-17.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-9.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-14.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-13.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-6.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-0.jfr 
/data00/home/gesong.samuel/logs/bench-
 search-baseline_vs_patch-baseline-2.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-7.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-3.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-11.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-1.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-4.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-12.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-5.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-15.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-18.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-8.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-19.jfr 
/data00/home/gesong.samuel/logs/bench-search-baseline_vs_patch-baseline-10.jfr
   Took 20.73 seconds
   WARNING: Using incubator modules: jdk.incubator.vector
   PROFILE SUMMARY from 6372866 events (total: 6M)
     tests.profile.mode=cpu
     tests.profile.count=30
     tests.profile.stacksize=1
     tests.profile.linenumbers=false
   PERCENT       CPU SAMPLES   STACK
   10.86%        692132        
jdk.incubator.vector.Int512Vector$Int512Mask#trueCount() [Inlined code]
   5.07%         323070        
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#nextPosition()
 [Inlined code]
   4.39%         279580        
org.apache.lucene.store.MemorySegmentIndexInput#readByte() [Inlined code]
   3.14%         199838        
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 
[Inlined code]
   2.40%         153045        
org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
 [Inlined code]
   2.35%         149991        
jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw() [Inlined code]
   2.29%         145718        perf.RandomQuery$1$1#nextDoc() [Inlined code]
   2.26%         143935        jdk.incubator.vector.IntVector#intoArray() 
[Inlined code]
   2.19%         139634        
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader#sumOverRange() 
[Inlined code]
   1.97%         125374        
org.apache.lucene.search.ScorerUtil#filterCompetitiveHits() [Inlined code]
   1.65%         105165        
org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() [Inlined 
code]
   1.34%         85677         
org.apache.lucene.search.PhraseScorer$1#matches() [JIT compiled code]
   1.29%         81991         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#freq()
 [Inlined code]
   1.20%         76168         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#refillFullBlock()
 [JIT compiled code]
   1.19%         75955         
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
 [JIT compiled code]
   1.11%         70842         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#advance()
 [Inlined code]
   1.00%         63821         
org.apache.lucene.internal.vectorization.PostingDecodingUtil#splitInts() [JIT 
compiled code]
   0.96%         61418         
org.apache.lucene.search.DisiPriorityQueueN#downHeap() [Inlined code]
   0.95%         60790         
org.apache.lucene.codecs.lucene103.ForUtil#expand8() [JIT compiled code]
   0.91%         57910         org.apache.lucene.util.PriorityQueue#upHeap() 
[Inlined code]
   0.90%         57656         org.apache.lucene.util.PriorityQueue#add() 
[Inlined code]
   0.89%         56542         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#advance()
 [JIT compiled code]
   0.88%         56218         
org.apache.lucene.search.SloppyPhraseMatcher#nextMatch() [JIT compiled code]
   0.88%         56032         
org.apache.lucene.search.ConjunctionDISI#doNext() [Inlined code]
   0.84%         53435         
org.apache.lucene.search.SloppyPhraseMatcher#maxFreq() [Inlined code]
   0.80%         51269         
org.apache.lucene.search.TermScorer#nextDocsAndScores() [Inlined code]
   0.78%         49990         
org.apache.lucene.search.ExactPhraseMatcher#advancePosition() [Inlined code]
   0.78%         49632         
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum#accumulatePendingPositions()
 [JIT compiled code]
   0.75%         47506         
org.apache.lucene.util.FixedBitSet#nextSetBitInRange() [Inlined code]
   0.74%         47444         
org.apache.lucene.search.MaxScoreBulkScorer#scoreInnerWindowWithFilter() [JIT 
compiled code]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Speed up findNextGEQ by aggresive stepping [lucene]

Reply via email to