[PR] Enable the vectorized findNextGEQ path on aarch64 [lucene]

via GitHub Sat, 13 Jun 2026 11:25:49 -0700


shbhar opened a new pull request, #16254:
URL: https://github.com/apache/lucene/pull/16254


   I work in Amazon Advertising, where we run a lot of low-latency Lucene 
search on various Graviton host types. While looking for optimization 
opportunities I came across this gate, which keeps the vectorized `findNextGEQ` 
off on Graviton2 and Graviton4, and enabling it benchmarks as a sizable win. 
This is a core, low-level path and the gate was set deliberately, so I may be 
missing context here. This is also my first PR in Lucene so I apologize in 
advance if I made a mistake.
   
   ### What's gated
   
   `PanamaVectorUtilSupport.findNextGEQ` finds the first index `>= target` in a 
sorted block of doc IDs, used by postings `advance()` / block skipping. It's 
behind a lane-count gate:
   
   ```java
   private static final boolean ENABLE_FIND_NEXT_GEQ_VECTOR_OPTO = 
INT_SPECIES.length() >= 8;
   ```
   
   On aarch64 the int lane count depends on the SVE width, which varies by 
Graviton generation:
   
   | Host | Core | Vector ISA | int lanes | gate `>= 8` |
   |------|------|-----------|-----------|-------------|
   | Graviton2 | Neoverse-N1 | NEON 128-bit (no SVE) | 4 | false (scalar) |
   | Graviton3 | Neoverse-V1 | SVE 256-bit | 8 | true (already vectorized) |
   | Graviton4 | Neoverse-V2 | SVE2 128-bit | 4 | false (scalar) |
   
   So Graviton2 and Graviton4 run the scalar fallback today. Only Graviton3 
(SVE-256) gets the vector path.
   
   ### The change
   
   ```java
   private static final boolean ENABLE_FIND_NEXT_GEQ_VECTOR_OPTO =
       INT_SPECIES.length() >= 8 || Constants.OS_ARCH.equals("aarch64");
   ```
   
   I used an explicit `aarch64` check rather than lowering the threshold to `>= 
4` so x86 is untouched. Its similar to existing  
VectorUtil.XOR_BIT_COUNT_STRIDE_AS_INT = Constants.OS_ARCH.equals("aarch64")` 
so I hope its ok
   
   ### Prior discussion
   
   The gate came from [#13958](https://github.com/apache/lucene/pull/13958), 
with benchmarking on other CPUs left for later. 
[#13968](https://github.com/apache/lucene/pull/13968) later moved postings to 
`int[]`, which is how the threshold became `>= 8`.
   
   ### Microbenchmark
   
   `AdvanceBenchmark.vectorUtilSearch` (already in `lucene/benchmark-jmh`) 
isolates `findNextGEQ`, and
   `linearSearch` from the same benchmark is a control. Real EC2, default JVM 
flags (no forced vector
   size), JDK 25 (Corretto 25.0.3), 10 forks, `main` vs this change.
   
   | Host | int lanes | benchmark | baseline (ops/ms) | candidate (ops/ms) | Δ |
   |------|-----------|-----------|-------------------|--------------------|---|
   | **c8g (Graviton4, SVE2-128)** | 4 | `vectorUtilSearch` | 254.8 ± 3.7 | 
**446.1 ± 3.6** | **+75%** |
   | c8g | 4 | `linearSearch` (control) | 251.2 ± 3.3 | 253.2 ± 1.9 | +0.8% |
   | **c6g (Graviton2, NEON-128)** | 4 | `vectorUtilSearch` | 129.2 ± 0.1 | 
**174.5 ± 2.5** | **+35%** |
   | c6g | 4 | `linearSearch` (control) | 117.8 | 117.8 | 0% |
   
   x86 and Graviton3 are unaffected: there `INT_SPECIES.length() >= 8` already 
holds, so the executed path is identical with and without this change.
   
   ### End-to-end (luceneutil)
   
   At `wikimedium1m` I saw no significant search-QPS change. At `wikimedium10m` 
(50 iterations, `main` vs this change) there is a clear signal. Every category 
that reaches p < 0.05 is below (the full per-category tables, significant or 
not, are in the collapsed sections after).
   
   **Graviton4 (c8g): 12 significant, all positive.**
   
   | Task | base QPS | base σ | cand QPS | cand σ | Δ | p |
   |------|---------:|-------:|---------:|-------:|---:|--:|
   | LowPhrase | 283.50 | 2.6% | 314.01 | 2.3% | +10.8% | 0.000 |
   | LowSloppyPhrase | 506.90 | 2.6% | 536.52 | 2.4% | +5.8% | 0.000 |
   | AndHighMed | 777.67 | 6.1% | 816.30 | 4.8% | +5.0% | 0.000 |
   | HighPhrase | 327.12 | 3.8% | 342.93 | 3.5% | +4.8% | 0.000 |
   | AndHighHigh | 406.62 | 4.1% | 423.43 | 6.3% | +4.1% | 0.000 |
   | LowIntervalsOrdered | 301.22 | 3.3% | 311.00 | 3.1% | +3.2% | 0.000 |
   | HighSloppyPhrase | 191.27 | 1.9% | 197.04 | 2.9% | +3.0% | 0.000 |
   | MedIntervalsOrdered | 175.92 | 3.3% | 180.90 | 2.8% | +2.8% | 0.000 |
   | MedSloppyPhrase | 835.98 | 4.3% | 854.93 | 4.6% | +2.3% | 0.011 |
   | HighSpanNear | 116.93 | 1.9% | 119.45 | 1.6% | +2.2% | 0.000 |
   | LowSpanNear | 632.71 | 3.2% | 641.57 | 2.4% | +1.4% | 0.013 |
   | MedSpanNear | 300.17 | 2.0% | 303.27 | 1.9% | +1.0% | 0.009 |
   
   **Graviton2 (c6g): 13 significant, 11 positive and 2 negative.**
   
   | Task | base QPS | base σ | cand QPS | cand σ | Δ | p |
   |------|---------:|-------:|---------:|-------:|---:|--:|
   | MedSpanNear | 175.16 | 1.8% | 185.08 | 1.9% | +5.7% | 0.000 |
   | AndHighMed | 585.15 | 4.0% | 615.11 | 6.1% | +5.1% | 0.000 |
   | OrHighHigh | 443.73 | 4.3% | 462.42 | 3.7% | +4.2% | 0.000 |
   | OrHighNotHigh | 452.12 | 6.5% | 468.35 | 5.4% | +3.6% | 0.003 |
   | OrNotHighHigh | 357.75 | 5.2% | 370.02 | 5.4% | +3.4% | 0.001 |
   | OrNotHighMed | 336.44 | 6.0% | 346.48 | 4.9% | +3.0% | 0.006 |
   | OrHighNotMed | 657.14 | 4.0% | 673.95 | 3.6% | +2.6% | 0.001 |
   | OrHighMed | 753.10 | 4.2% | 772.76 | 4.3% | +2.6% | 0.002 |
   | MedPhrase | 339.90 | 3.8% | 348.00 | 2.6% | +2.4% | 0.000 |
   | LowSloppyPhrase | 309.87 | 2.0% | 313.18 | 2.6% | +1.1% | 0.022 |
   | AndHighHighDayTaxoFacets | 26.00 | 2.3% | 26.27 | 1.5% | +1.0% | 0.008 |
   | HighSpanNear | 113.76 | 1.9% | 112.75 | 1.9% | -0.9% | 0.019 |
   | IntSet | 780.91 | 8.5% | 747.84 | 7.3% | -4.2% | 0.007 |
   
   The gains do seem to be mostly on advance-heavy query types (conjunctions, 
phrases, span-near, intervals, disjunctions), which fits where `findNextGEQ` 
runs. The two small negatives on c6g are positive on c8g (IntSet +1.7%, 
HighSpanNear +2.2%), so I am not sure what that means.
   
   <details>
   <summary>Full wikimedium10m table, Graviton4 (c8g), all categories</summary>
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
               BrowseDateTaxoFacets       22.95     (38.4%)       22.21     
(35.5%)   -3.2% ( -55% -  114%) 0.662
          BrowseDayOfYearTaxoFacets       23.21     (38.4%)       22.46     
(35.7%)   -3.2% ( -55% -  115%) 0.666
        BrowseRandomLabelTaxoFacets       19.05     (34.9%)       18.51     
(31.8%)   -2.8% ( -51% -   97%) 0.672
                           HighTerm     1566.74      (7.5%)     1542.65     
(10.8%)   -1.5% ( -18% -   18%) 0.409
                      OrNotHighHigh      493.76      (6.9%)      486.80      
(6.6%)   -1.4% ( -13% -   12%) 0.296
              BrowseMonthSSDVFacets       20.14     (11.3%)       19.89     
(10.7%)   -1.2% ( -20% -   23%) 0.580
                         OrHighHigh      601.84      (5.5%)      595.24      
(6.2%)   -1.1% ( -12% -   11%) 0.351
                            MedTerm     1935.86      (5.6%)     1918.57      
(6.4%)   -0.9% ( -12% -   11%) 0.456
                       OrHighNotMed      887.22      (5.0%)      879.77      
(4.9%)   -0.8% ( -10% -    9%) 0.399
                              range     2923.47      (7.4%)     2908.86      
(8.1%)   -0.5% ( -14% -   16%) 0.747
                           PKLookup      355.29      (2.4%)      353.84      
(2.2%)   -0.4% (  -4% -    4%) 0.373
               BrowseDateSSDVFacets        3.45      (7.6%)        3.44      
(7.7%)   -0.3% ( -14% -   16%) 0.843
        BrowseRandomLabelSSDVFacets       14.60      (7.8%)       14.56      
(8.4%)   -0.3% ( -15% -   17%) 0.872
                     AndMissingHigh     3210.14      (5.6%)     3205.03      
(6.1%)   -0.2% ( -11% -   12%) 0.891
             OrHighMedDayTaxoFacets       31.11      (3.1%)       31.07      
(3.0%)   -0.1% (  -6% -    6%) 0.826
           AndHighHighDayTaxoFacets       30.38      (2.5%)       30.37      
(2.3%)   -0.0% (  -4% -    4%) 0.940
                       OrNotHighLow     1980.87      (4.1%)     1980.60      
(5.7%)   -0.0% (  -9% -   10%) 0.989
                         AndHighLow     1858.53      (5.9%)     1860.08      
(5.8%)    0.1% ( -10% -   12%) 0.943
               HighTermTitleBDVSort      202.94      (2.2%)      203.15      
(1.7%)    0.1% (  -3% -    4%) 0.791
              BrowseMonthTaxoFacets       39.37     (16.2%)       39.44     
(16.0%)    0.2% ( -27% -   38%) 0.958
                            Respell       71.17      (1.6%)       71.34      
(1.8%)    0.2% (  -3% -    3%) 0.488
                       OrHighNotLow     1394.09      (5.6%)     1398.87      
(4.9%)    0.3% (  -9% -   11%) 0.744
            AndHighMedDayTaxoFacets      181.85      (2.6%)      182.63      
(2.7%)    0.4% (  -4% -    5%) 0.417
               MedTermDayTaxoFacets       65.28      (2.5%)       65.62      
(2.3%)    0.5% (  -4% -    5%) 0.285
                  HighTermTitleSort      196.68      (4.5%)      197.69      
(3.6%)    0.5% (  -7% -    9%) 0.527
                          OrHighMed     1144.77      (4.8%)     1150.87      
(4.7%)    0.5% (  -8% -   10%) 0.575
                           Wildcard       64.56      (2.5%)       64.95      
(2.5%)    0.6% (  -4% -    5%) 0.240
                          MedPhrase      470.54      (3.1%)      473.34      
(2.6%)    0.6% (  -4% -    6%) 0.298
                             Fuzzy1      117.26      (2.6%)      118.10      
(2.7%)    0.7% (  -4% -    6%) 0.176
                  HighTermMonthSort     1617.30      (4.6%)     1629.50      
(5.1%)    0.8% (  -8% -   10%) 0.439
                            LowTerm     2235.28      (5.4%)     2252.56      
(5.9%)    0.8% (  -9% -   12%) 0.493
          BrowseDayOfYearSSDVFacets       19.48      (7.9%)       19.64      
(9.6%)    0.8% ( -15% -   19%) 0.650
                             Fuzzy2      101.01      (2.2%)      101.93      
(2.8%)    0.9% (  -4% -    6%) 0.073
                      OrHighNotHigh      501.32      (6.6%)      506.07      
(6.7%)    0.9% ( -11% -   15%) 0.476
                        MedSpanNear      300.17      (2.0%)      303.27      
(1.9%)    1.0% (  -2% -    5%) 0.009
                          OrHighLow     1454.92      (4.7%)     1470.39      
(4.0%)    1.1% (  -7% -   10%) 0.224
                         TermDTSort      605.93      (4.8%)      612.47      
(4.9%)    1.1% (  -8% -   11%) 0.267
                        LowSpanNear      632.71      (3.2%)      641.57      
(2.4%)    1.4% (  -4% -    7%) 0.013
                             IntSet     1197.87      (8.2%)     1217.94      
(7.5%)    1.7% ( -12% -   18%) 0.286
              HighTermDayOfYearSort      617.24      (5.3%)      627.89      
(3.9%)    1.7% (  -7% -   11%) 0.062
                             IntNRQ      874.53      (7.4%)      893.18      
(5.3%)    2.1% (  -9% -   16%) 0.097
                       HighSpanNear      116.93      (1.9%)      119.45      
(1.6%)    2.2% (  -1% -    5%) 0.000
               HighIntervalsOrdered       20.09      (5.8%)       20.54      
(5.9%)    2.2% (  -8% -   14%) 0.057
                    MedSloppyPhrase      835.98      (4.3%)      854.93      
(4.6%)    2.3% (  -6% -   11%) 0.011
                       OrNotHighMed      986.38      (7.7%)     1010.46      
(5.4%)    2.4% (  -9% -   16%) 0.067
                MedIntervalsOrdered      175.92      (3.3%)      180.90      
(2.8%)    2.8% (  -3% -    9%) 0.000
                            Prefix3      861.12      (9.0%)      886.21     
(10.0%)    2.9% ( -14% -   24%) 0.125
                   HighSloppyPhrase      191.27      (1.9%)      197.04      
(2.9%)    3.0% (  -1% -    7%) 0.000
                LowIntervalsOrdered      301.22      (3.3%)      311.00      
(3.1%)    3.2% (  -3% -   10%) 0.000
                        AndHighHigh      406.62      (4.1%)      423.43      
(6.3%)    4.1% (  -6% -   15%) 0.000
                         HighPhrase      327.12      (3.8%)      342.93      
(3.5%)    4.8% (  -2% -   12%) 0.000
                         AndHighMed      777.67      (6.1%)      816.30      
(4.8%)    5.0% (  -5% -   16%) 0.000
                    LowSloppyPhrase      506.90      (2.6%)      536.52      
(2.4%)    5.8% (   0% -   11%) 0.000
                          LowPhrase      283.50      (2.6%)      314.01      
(2.3%)   10.8% (   5% -   16%) 0.000
   ```
   </details>
   
   <details>
   <summary>Full wikimedium10m table, Graviton2 (c6g), all categories</summary>
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                             IntSet      780.91      (8.5%)      747.84      
(7.3%)   -4.2% ( -18% -   12%) 0.007
                       HighSpanNear      113.76      (1.9%)      112.75      
(1.9%)   -0.9% (  -4% -    2%) 0.019
        BrowseRandomLabelSSDVFacets        6.87      (6.5%)        6.82      
(5.4%)   -0.7% ( -11% -   11%) 0.540
              BrowseMonthTaxoFacets        9.64      (1.9%)        9.58      
(3.3%)   -0.6% (  -5% -    4%) 0.273
               HighIntervalsOrdered      119.24      (6.5%)      118.56      
(6.5%)   -0.6% ( -12% -   13%) 0.664
                            LowTerm     1615.51      (3.6%)     1607.60      
(4.5%)   -0.5% (  -8% -    7%) 0.546
                           Wildcard       17.12      (1.7%)       17.04      
(2.0%)   -0.5% (  -4% -    3%) 0.198
            AndHighMedDayTaxoFacets      147.16      (1.8%)      146.48      
(2.3%)   -0.5% (  -4% -    3%) 0.260
                             Fuzzy2       71.06      (1.9%)       70.76      
(2.6%)   -0.4% (  -4% -    4%) 0.352
                            Prefix3      930.05      (5.7%)      927.30      
(5.9%)   -0.3% ( -11% -   11%) 0.798
                            Respell       43.32      (2.2%)       43.20      
(2.2%)   -0.3% (  -4% -    4%) 0.522
                           HighTerm      501.14      (4.8%)      499.88      
(4.8%)   -0.3% (  -9% -    9%) 0.795
                   HighSloppyPhrase       54.78      (2.1%)       54.69      
(2.2%)   -0.2% (  -4% -    4%) 0.682
                          OrHighLow      762.73      (4.3%)      761.96      
(3.3%)   -0.1% (  -7% -    7%) 0.896
                LowIntervalsOrdered      159.47      (6.7%)      159.33      
(6.7%)   -0.1% ( -12% -   14%) 0.948
               HighTermTitleBDVSort       56.15      (2.5%)       56.13      
(2.4%)   -0.0% (  -4% -    4%) 0.942
          BrowseDayOfYearSSDVFacets        9.88      (7.2%)        9.87      
(5.9%)   -0.0% ( -12% -   14%) 0.992
                           PKLookup      203.43      (2.0%)      203.41      
(2.1%)   -0.0% (  -3% -    4%) 0.983
                     AndMissingHigh     2160.07      (4.6%)     2160.13      
(5.1%)    0.0% (  -9% -   10%) 0.997
                       OrHighNotLow      884.80      (5.9%)      885.20      
(4.5%)    0.0% (  -9% -   11%) 0.966
                    MedSloppyPhrase      120.70      (2.2%)      120.89      
(2.7%)    0.2% (  -4% -    5%) 0.745
                         HighPhrase       13.47      (2.7%)       13.49      
(2.1%)    0.2% (  -4% -    5%) 0.710
                            MedTerm      945.88      (5.0%)      948.44      
(4.8%)    0.3% (  -9% -   10%) 0.783
                             Fuzzy1       80.40      (2.3%)       80.66      
(2.3%)    0.3% (  -4% -    4%) 0.489
        BrowseRandomLabelTaxoFacets        7.71     (14.7%)        7.73     
(14.1%)    0.3% ( -24% -   34%) 0.911
               BrowseDateTaxoFacets       10.00      (9.4%)       10.03      
(8.9%)    0.3% ( -16% -   20%) 0.855
          BrowseDayOfYearTaxoFacets       10.14      (9.2%)       10.18      
(8.6%)    0.3% ( -16% -   19%) 0.852
                        LowSpanNear      246.28      (2.1%)      247.21      
(2.2%)    0.4% (  -3% -    4%) 0.391
                              range     3490.14      (5.5%)     3505.64      
(4.9%)    0.4% (  -9% -   11%) 0.673
                         AndHighLow     1163.36      (3.4%)     1169.34      
(3.7%)    0.5% (  -6% -    7%) 0.473
               MedTermDayTaxoFacets       66.29      (2.5%)       66.69      
(2.7%)    0.6% (  -4% -    6%) 0.257
              BrowseMonthSSDVFacets       10.15      (9.1%)       10.21      
(7.9%)    0.6% ( -15% -   19%) 0.716
             OrHighMedDayTaxoFacets       14.80      (3.8%)       14.90      
(2.7%)    0.6% (  -5% -    7%) 0.344
                          LowPhrase      529.01      (2.5%)      532.39      
(2.6%)    0.6% (  -4% -    5%) 0.208
              HighTermDayOfYearSort      378.09      (2.7%)      380.56      
(4.9%)    0.7% (  -6% -    8%) 0.410
               BrowseDateSSDVFacets        1.79      (7.7%)        1.80      
(7.4%)    0.7% ( -13% -   17%) 0.665
                MedIntervalsOrdered      117.72      (3.4%)      118.56      
(3.0%)    0.7% (  -5% -    7%) 0.266
                       OrNotHighLow     1181.78      (4.2%)     1193.48      
(4.3%)    1.0% (  -7% -    9%) 0.241
           AndHighHighDayTaxoFacets       26.00      (2.3%)       26.27      
(1.5%)    1.0% (  -2% -    4%) 0.008
                             IntNRQ      553.08      (5.0%)      558.89      
(4.1%)    1.0% (  -7% -   10%) 0.251
                  HighTermMonthSort      987.65      (4.7%)      998.02      
(3.9%)    1.1% (  -7% -   10%) 0.222
                    LowSloppyPhrase      309.87      (2.0%)      313.18      
(2.6%)    1.1% (  -3% -    5%) 0.022
                        AndHighHigh      376.20      (4.1%)      380.50      
(4.3%)    1.1% (  -6% -    9%) 0.173
                  HighTermTitleSort      110.55      (3.7%)      111.89      
(3.2%)    1.2% (  -5% -    8%) 0.082
                         TermDTSort      369.83      (5.3%)      375.94      
(3.4%)    1.7% (  -6% -   10%) 0.062
                          MedPhrase      339.90      (3.8%)      348.00      
(2.6%)    2.4% (  -3% -    9%) 0.000
                       OrHighNotMed      657.14      (4.0%)      673.95      
(3.6%)    2.6% (  -4% -   10%) 0.001
                          OrHighMed      753.10      (4.2%)      772.76      
(4.3%)    2.6% (  -5% -   11%) 0.002
                       OrNotHighMed      336.44      (6.0%)      346.48      
(4.9%)    3.0% (  -7% -   14%) 0.006
                      OrNotHighHigh      357.75      (5.2%)      370.02      
(5.4%)    3.4% (  -6% -   14%) 0.001
                      OrHighNotHigh      452.12      (6.5%)      468.35      
(5.4%)    3.6% (  -7% -   16%) 0.003
                         OrHighHigh      443.73      (4.3%)      462.42      
(3.7%)    4.2% (  -3% -   12%) 0.000
                         AndHighMed      585.15      (4.0%)      615.11      
(6.1%)    5.1% (  -4% -   15%) 0.000
                        MedSpanNear      175.16      (1.8%)      185.08      
(1.9%)    5.7% (   1% -    9%) 0.000
   ```
   </details>
   
   
   ### Note
   
   I used Claude Code to help with the benchmarking setup and refining the 
above write-up. Happy to run any additional validation that would help.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Enable the vectorized findNextGEQ path on aarch64 [lucene]

Reply via email to