shbhar opened a new pull request, #16254:
URL: https://github.com/apache/lucene/pull/16254
I work in Amazon Advertising, where we run a lot of low-latency Lucene
search on various Graviton host types. While looking for optimization
opportunities I came across this gate, which keeps the vectorized `findNextGEQ`
off on Graviton2 and Graviton4, and enabling it benchmarks as a sizable win.
This is a core, low-level path and the gate was set deliberately, so I may be
missing context here. This is also my first PR in Lucene so I apologize in
advance if I made a mistake.
### What's gated
`PanamaVectorUtilSupport.findNextGEQ` finds the first index `>= target` in a
sorted block of doc IDs, used by postings `advance()` / block skipping. It's
behind a lane-count gate:
```java
private static final boolean ENABLE_FIND_NEXT_GEQ_VECTOR_OPTO =
INT_SPECIES.length() >= 8;
```
On aarch64 the int lane count depends on the SVE width, which varies by
Graviton generation:
| Host | Core | Vector ISA | int lanes | gate `>= 8` |
|------|------|-----------|-----------|-------------|
| Graviton2 | Neoverse-N1 | NEON 128-bit (no SVE) | 4 | false (scalar) |
| Graviton3 | Neoverse-V1 | SVE 256-bit | 8 | true (already vectorized) |
| Graviton4 | Neoverse-V2 | SVE2 128-bit | 4 | false (scalar) |
So Graviton2 and Graviton4 run the scalar fallback today. Only Graviton3
(SVE-256) gets the vector path.
### The change
```java
private static final boolean ENABLE_FIND_NEXT_GEQ_VECTOR_OPTO =
INT_SPECIES.length() >= 8 || Constants.OS_ARCH.equals("aarch64");
```
I used an explicit `aarch64` check rather than lowering the threshold to `>=
4` so x86 is untouched. Its similar to existing
VectorUtil.XOR_BIT_COUNT_STRIDE_AS_INT = Constants.OS_ARCH.equals("aarch64")`
so I hope its ok
### Prior discussion
The gate came from [#13958](https://github.com/apache/lucene/pull/13958),
with benchmarking on other CPUs left for later.
[#13968](https://github.com/apache/lucene/pull/13968) later moved postings to
`int[]`, which is how the threshold became `>= 8`.
### Microbenchmark
`AdvanceBenchmark.vectorUtilSearch` (already in `lucene/benchmark-jmh`)
isolates `findNextGEQ`, and
`linearSearch` from the same benchmark is a control. Real EC2, default JVM
flags (no forced vector
size), JDK 25 (Corretto 25.0.3), 10 forks, `main` vs this change.
| Host | int lanes | benchmark | baseline (ops/ms) | candidate (ops/ms) | Δ |
|------|-----------|-----------|-------------------|--------------------|---|
| **c8g (Graviton4, SVE2-128)** | 4 | `vectorUtilSearch` | 254.8 ± 3.7 |
**446.1 ± 3.6** | **+75%** |
| c8g | 4 | `linearSearch` (control) | 251.2 ± 3.3 | 253.2 ± 1.9 | +0.8% |
| **c6g (Graviton2, NEON-128)** | 4 | `vectorUtilSearch` | 129.2 ± 0.1 |
**174.5 ± 2.5** | **+35%** |
| c6g | 4 | `linearSearch` (control) | 117.8 | 117.8 | 0% |
x86 and Graviton3 are unaffected: there `INT_SPECIES.length() >= 8` already
holds, so the executed path is identical with and without this change.
### End-to-end (luceneutil)
At `wikimedium1m` I saw no significant search-QPS change. At `wikimedium10m`
(50 iterations, `main` vs this change) there is a clear signal. Every category
that reaches p < 0.05 is below (the full per-category tables, significant or
not, are in the collapsed sections after).
**Graviton4 (c8g): 12 significant, all positive.**
| Task | base QPS | base σ | cand QPS | cand σ | Δ | p |
|------|---------:|-------:|---------:|-------:|---:|--:|
| LowPhrase | 283.50 | 2.6% | 314.01 | 2.3% | +10.8% | 0.000 |
| LowSloppyPhrase | 506.90 | 2.6% | 536.52 | 2.4% | +5.8% | 0.000 |
| AndHighMed | 777.67 | 6.1% | 816.30 | 4.8% | +5.0% | 0.000 |
| HighPhrase | 327.12 | 3.8% | 342.93 | 3.5% | +4.8% | 0.000 |
| AndHighHigh | 406.62 | 4.1% | 423.43 | 6.3% | +4.1% | 0.000 |
| LowIntervalsOrdered | 301.22 | 3.3% | 311.00 | 3.1% | +3.2% | 0.000 |
| HighSloppyPhrase | 191.27 | 1.9% | 197.04 | 2.9% | +3.0% | 0.000 |
| MedIntervalsOrdered | 175.92 | 3.3% | 180.90 | 2.8% | +2.8% | 0.000 |
| MedSloppyPhrase | 835.98 | 4.3% | 854.93 | 4.6% | +2.3% | 0.011 |
| HighSpanNear | 116.93 | 1.9% | 119.45 | 1.6% | +2.2% | 0.000 |
| LowSpanNear | 632.71 | 3.2% | 641.57 | 2.4% | +1.4% | 0.013 |
| MedSpanNear | 300.17 | 2.0% | 303.27 | 1.9% | +1.0% | 0.009 |
**Graviton2 (c6g): 13 significant, 11 positive and 2 negative.**
| Task | base QPS | base σ | cand QPS | cand σ | Δ | p |
|------|---------:|-------:|---------:|-------:|---:|--:|
| MedSpanNear | 175.16 | 1.8% | 185.08 | 1.9% | +5.7% | 0.000 |
| AndHighMed | 585.15 | 4.0% | 615.11 | 6.1% | +5.1% | 0.000 |
| OrHighHigh | 443.73 | 4.3% | 462.42 | 3.7% | +4.2% | 0.000 |
| OrHighNotHigh | 452.12 | 6.5% | 468.35 | 5.4% | +3.6% | 0.003 |
| OrNotHighHigh | 357.75 | 5.2% | 370.02 | 5.4% | +3.4% | 0.001 |
| OrNotHighMed | 336.44 | 6.0% | 346.48 | 4.9% | +3.0% | 0.006 |
| OrHighNotMed | 657.14 | 4.0% | 673.95 | 3.6% | +2.6% | 0.001 |
| OrHighMed | 753.10 | 4.2% | 772.76 | 4.3% | +2.6% | 0.002 |
| MedPhrase | 339.90 | 3.8% | 348.00 | 2.6% | +2.4% | 0.000 |
| LowSloppyPhrase | 309.87 | 2.0% | 313.18 | 2.6% | +1.1% | 0.022 |
| AndHighHighDayTaxoFacets | 26.00 | 2.3% | 26.27 | 1.5% | +1.0% | 0.008 |
| HighSpanNear | 113.76 | 1.9% | 112.75 | 1.9% | -0.9% | 0.019 |
| IntSet | 780.91 | 8.5% | 747.84 | 7.3% | -4.2% | 0.007 |
The gains do seem to be mostly on advance-heavy query types (conjunctions,
phrases, span-near, intervals, disjunctions), which fits where `findNextGEQ`
runs. The two small negatives on c6g are positive on c8g (IntSet +1.7%,
HighSpanNear +2.2%), so I am not sure what that means.
<details>
<summary>Full wikimedium10m table, Graviton4 (c8g), all categories</summary>
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
BrowseDateTaxoFacets 22.95 (38.4%) 22.21
(35.5%) -3.2% ( -55% - 114%) 0.662
BrowseDayOfYearTaxoFacets 23.21 (38.4%) 22.46
(35.7%) -3.2% ( -55% - 115%) 0.666
BrowseRandomLabelTaxoFacets 19.05 (34.9%) 18.51
(31.8%) -2.8% ( -51% - 97%) 0.672
HighTerm 1566.74 (7.5%) 1542.65
(10.8%) -1.5% ( -18% - 18%) 0.409
OrNotHighHigh 493.76 (6.9%) 486.80
(6.6%) -1.4% ( -13% - 12%) 0.296
BrowseMonthSSDVFacets 20.14 (11.3%) 19.89
(10.7%) -1.2% ( -20% - 23%) 0.580
OrHighHigh 601.84 (5.5%) 595.24
(6.2%) -1.1% ( -12% - 11%) 0.351
MedTerm 1935.86 (5.6%) 1918.57
(6.4%) -0.9% ( -12% - 11%) 0.456
OrHighNotMed 887.22 (5.0%) 879.77
(4.9%) -0.8% ( -10% - 9%) 0.399
range 2923.47 (7.4%) 2908.86
(8.1%) -0.5% ( -14% - 16%) 0.747
PKLookup 355.29 (2.4%) 353.84
(2.2%) -0.4% ( -4% - 4%) 0.373
BrowseDateSSDVFacets 3.45 (7.6%) 3.44
(7.7%) -0.3% ( -14% - 16%) 0.843
BrowseRandomLabelSSDVFacets 14.60 (7.8%) 14.56
(8.4%) -0.3% ( -15% - 17%) 0.872
AndMissingHigh 3210.14 (5.6%) 3205.03
(6.1%) -0.2% ( -11% - 12%) 0.891
OrHighMedDayTaxoFacets 31.11 (3.1%) 31.07
(3.0%) -0.1% ( -6% - 6%) 0.826
AndHighHighDayTaxoFacets 30.38 (2.5%) 30.37
(2.3%) -0.0% ( -4% - 4%) 0.940
OrNotHighLow 1980.87 (4.1%) 1980.60
(5.7%) -0.0% ( -9% - 10%) 0.989
AndHighLow 1858.53 (5.9%) 1860.08
(5.8%) 0.1% ( -10% - 12%) 0.943
HighTermTitleBDVSort 202.94 (2.2%) 203.15
(1.7%) 0.1% ( -3% - 4%) 0.791
BrowseMonthTaxoFacets 39.37 (16.2%) 39.44
(16.0%) 0.2% ( -27% - 38%) 0.958
Respell 71.17 (1.6%) 71.34
(1.8%) 0.2% ( -3% - 3%) 0.488
OrHighNotLow 1394.09 (5.6%) 1398.87
(4.9%) 0.3% ( -9% - 11%) 0.744
AndHighMedDayTaxoFacets 181.85 (2.6%) 182.63
(2.7%) 0.4% ( -4% - 5%) 0.417
MedTermDayTaxoFacets 65.28 (2.5%) 65.62
(2.3%) 0.5% ( -4% - 5%) 0.285
HighTermTitleSort 196.68 (4.5%) 197.69
(3.6%) 0.5% ( -7% - 9%) 0.527
OrHighMed 1144.77 (4.8%) 1150.87
(4.7%) 0.5% ( -8% - 10%) 0.575
Wildcard 64.56 (2.5%) 64.95
(2.5%) 0.6% ( -4% - 5%) 0.240
MedPhrase 470.54 (3.1%) 473.34
(2.6%) 0.6% ( -4% - 6%) 0.298
Fuzzy1 117.26 (2.6%) 118.10
(2.7%) 0.7% ( -4% - 6%) 0.176
HighTermMonthSort 1617.30 (4.6%) 1629.50
(5.1%) 0.8% ( -8% - 10%) 0.439
LowTerm 2235.28 (5.4%) 2252.56
(5.9%) 0.8% ( -9% - 12%) 0.493
BrowseDayOfYearSSDVFacets 19.48 (7.9%) 19.64
(9.6%) 0.8% ( -15% - 19%) 0.650
Fuzzy2 101.01 (2.2%) 101.93
(2.8%) 0.9% ( -4% - 6%) 0.073
OrHighNotHigh 501.32 (6.6%) 506.07
(6.7%) 0.9% ( -11% - 15%) 0.476
MedSpanNear 300.17 (2.0%) 303.27
(1.9%) 1.0% ( -2% - 5%) 0.009
OrHighLow 1454.92 (4.7%) 1470.39
(4.0%) 1.1% ( -7% - 10%) 0.224
TermDTSort 605.93 (4.8%) 612.47
(4.9%) 1.1% ( -8% - 11%) 0.267
LowSpanNear 632.71 (3.2%) 641.57
(2.4%) 1.4% ( -4% - 7%) 0.013
IntSet 1197.87 (8.2%) 1217.94
(7.5%) 1.7% ( -12% - 18%) 0.286
HighTermDayOfYearSort 617.24 (5.3%) 627.89
(3.9%) 1.7% ( -7% - 11%) 0.062
IntNRQ 874.53 (7.4%) 893.18
(5.3%) 2.1% ( -9% - 16%) 0.097
HighSpanNear 116.93 (1.9%) 119.45
(1.6%) 2.2% ( -1% - 5%) 0.000
HighIntervalsOrdered 20.09 (5.8%) 20.54
(5.9%) 2.2% ( -8% - 14%) 0.057
MedSloppyPhrase 835.98 (4.3%) 854.93
(4.6%) 2.3% ( -6% - 11%) 0.011
OrNotHighMed 986.38 (7.7%) 1010.46
(5.4%) 2.4% ( -9% - 16%) 0.067
MedIntervalsOrdered 175.92 (3.3%) 180.90
(2.8%) 2.8% ( -3% - 9%) 0.000
Prefix3 861.12 (9.0%) 886.21
(10.0%) 2.9% ( -14% - 24%) 0.125
HighSloppyPhrase 191.27 (1.9%) 197.04
(2.9%) 3.0% ( -1% - 7%) 0.000
LowIntervalsOrdered 301.22 (3.3%) 311.00
(3.1%) 3.2% ( -3% - 10%) 0.000
AndHighHigh 406.62 (4.1%) 423.43
(6.3%) 4.1% ( -6% - 15%) 0.000
HighPhrase 327.12 (3.8%) 342.93
(3.5%) 4.8% ( -2% - 12%) 0.000
AndHighMed 777.67 (6.1%) 816.30
(4.8%) 5.0% ( -5% - 16%) 0.000
LowSloppyPhrase 506.90 (2.6%) 536.52
(2.4%) 5.8% ( 0% - 11%) 0.000
LowPhrase 283.50 (2.6%) 314.01
(2.3%) 10.8% ( 5% - 16%) 0.000
```
</details>
<details>
<summary>Full wikimedium10m table, Graviton2 (c6g), all categories</summary>
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
IntSet 780.91 (8.5%) 747.84
(7.3%) -4.2% ( -18% - 12%) 0.007
HighSpanNear 113.76 (1.9%) 112.75
(1.9%) -0.9% ( -4% - 2%) 0.019
BrowseRandomLabelSSDVFacets 6.87 (6.5%) 6.82
(5.4%) -0.7% ( -11% - 11%) 0.540
BrowseMonthTaxoFacets 9.64 (1.9%) 9.58
(3.3%) -0.6% ( -5% - 4%) 0.273
HighIntervalsOrdered 119.24 (6.5%) 118.56
(6.5%) -0.6% ( -12% - 13%) 0.664
LowTerm 1615.51 (3.6%) 1607.60
(4.5%) -0.5% ( -8% - 7%) 0.546
Wildcard 17.12 (1.7%) 17.04
(2.0%) -0.5% ( -4% - 3%) 0.198
AndHighMedDayTaxoFacets 147.16 (1.8%) 146.48
(2.3%) -0.5% ( -4% - 3%) 0.260
Fuzzy2 71.06 (1.9%) 70.76
(2.6%) -0.4% ( -4% - 4%) 0.352
Prefix3 930.05 (5.7%) 927.30
(5.9%) -0.3% ( -11% - 11%) 0.798
Respell 43.32 (2.2%) 43.20
(2.2%) -0.3% ( -4% - 4%) 0.522
HighTerm 501.14 (4.8%) 499.88
(4.8%) -0.3% ( -9% - 9%) 0.795
HighSloppyPhrase 54.78 (2.1%) 54.69
(2.2%) -0.2% ( -4% - 4%) 0.682
OrHighLow 762.73 (4.3%) 761.96
(3.3%) -0.1% ( -7% - 7%) 0.896
LowIntervalsOrdered 159.47 (6.7%) 159.33
(6.7%) -0.1% ( -12% - 14%) 0.948
HighTermTitleBDVSort 56.15 (2.5%) 56.13
(2.4%) -0.0% ( -4% - 4%) 0.942
BrowseDayOfYearSSDVFacets 9.88 (7.2%) 9.87
(5.9%) -0.0% ( -12% - 14%) 0.992
PKLookup 203.43 (2.0%) 203.41
(2.1%) -0.0% ( -3% - 4%) 0.983
AndMissingHigh 2160.07 (4.6%) 2160.13
(5.1%) 0.0% ( -9% - 10%) 0.997
OrHighNotLow 884.80 (5.9%) 885.20
(4.5%) 0.0% ( -9% - 11%) 0.966
MedSloppyPhrase 120.70 (2.2%) 120.89
(2.7%) 0.2% ( -4% - 5%) 0.745
HighPhrase 13.47 (2.7%) 13.49
(2.1%) 0.2% ( -4% - 5%) 0.710
MedTerm 945.88 (5.0%) 948.44
(4.8%) 0.3% ( -9% - 10%) 0.783
Fuzzy1 80.40 (2.3%) 80.66
(2.3%) 0.3% ( -4% - 4%) 0.489
BrowseRandomLabelTaxoFacets 7.71 (14.7%) 7.73
(14.1%) 0.3% ( -24% - 34%) 0.911
BrowseDateTaxoFacets 10.00 (9.4%) 10.03
(8.9%) 0.3% ( -16% - 20%) 0.855
BrowseDayOfYearTaxoFacets 10.14 (9.2%) 10.18
(8.6%) 0.3% ( -16% - 19%) 0.852
LowSpanNear 246.28 (2.1%) 247.21
(2.2%) 0.4% ( -3% - 4%) 0.391
range 3490.14 (5.5%) 3505.64
(4.9%) 0.4% ( -9% - 11%) 0.673
AndHighLow 1163.36 (3.4%) 1169.34
(3.7%) 0.5% ( -6% - 7%) 0.473
MedTermDayTaxoFacets 66.29 (2.5%) 66.69
(2.7%) 0.6% ( -4% - 6%) 0.257
BrowseMonthSSDVFacets 10.15 (9.1%) 10.21
(7.9%) 0.6% ( -15% - 19%) 0.716
OrHighMedDayTaxoFacets 14.80 (3.8%) 14.90
(2.7%) 0.6% ( -5% - 7%) 0.344
LowPhrase 529.01 (2.5%) 532.39
(2.6%) 0.6% ( -4% - 5%) 0.208
HighTermDayOfYearSort 378.09 (2.7%) 380.56
(4.9%) 0.7% ( -6% - 8%) 0.410
BrowseDateSSDVFacets 1.79 (7.7%) 1.80
(7.4%) 0.7% ( -13% - 17%) 0.665
MedIntervalsOrdered 117.72 (3.4%) 118.56
(3.0%) 0.7% ( -5% - 7%) 0.266
OrNotHighLow 1181.78 (4.2%) 1193.48
(4.3%) 1.0% ( -7% - 9%) 0.241
AndHighHighDayTaxoFacets 26.00 (2.3%) 26.27
(1.5%) 1.0% ( -2% - 4%) 0.008
IntNRQ 553.08 (5.0%) 558.89
(4.1%) 1.0% ( -7% - 10%) 0.251
HighTermMonthSort 987.65 (4.7%) 998.02
(3.9%) 1.1% ( -7% - 10%) 0.222
LowSloppyPhrase 309.87 (2.0%) 313.18
(2.6%) 1.1% ( -3% - 5%) 0.022
AndHighHigh 376.20 (4.1%) 380.50
(4.3%) 1.1% ( -6% - 9%) 0.173
HighTermTitleSort 110.55 (3.7%) 111.89
(3.2%) 1.2% ( -5% - 8%) 0.082
TermDTSort 369.83 (5.3%) 375.94
(3.4%) 1.7% ( -6% - 10%) 0.062
MedPhrase 339.90 (3.8%) 348.00
(2.6%) 2.4% ( -3% - 9%) 0.000
OrHighNotMed 657.14 (4.0%) 673.95
(3.6%) 2.6% ( -4% - 10%) 0.001
OrHighMed 753.10 (4.2%) 772.76
(4.3%) 2.6% ( -5% - 11%) 0.002
OrNotHighMed 336.44 (6.0%) 346.48
(4.9%) 3.0% ( -7% - 14%) 0.006
OrNotHighHigh 357.75 (5.2%) 370.02
(5.4%) 3.4% ( -6% - 14%) 0.001
OrHighNotHigh 452.12 (6.5%) 468.35
(5.4%) 3.6% ( -7% - 16%) 0.003
OrHighHigh 443.73 (4.3%) 462.42
(3.7%) 4.2% ( -3% - 12%) 0.000
AndHighMed 585.15 (4.0%) 615.11
(6.1%) 5.1% ( -4% - 15%) 0.000
MedSpanNear 175.16 (1.8%) 185.08
(1.9%) 5.7% ( 1% - 9%) 0.000
```
</details>
### Note
I used Claude Code to help with the benchmarking setup and refining the
above write-up. Happy to run any additional validation that would help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]