jpountz commented on PR #13692:
URL: https://github.com/apache/lucene/pull/13692#issuecomment-2311845595
Here is what the `AdvanceBenchmark` reports. The branchless binary search is
`binarySearch5`, which performs much faster than a regular binary search but
still slower than a linear search in this benchmark that tries to reproduce the
distribution I'm seeing in practice, where the first gte doc ID is usually only
a few docs away. luceneutil also reports a slowdown if I wire this branchless
binary search in. `hybridSearch` is the only approach I could come up with that
would beat `linearSearch` in this benchmark, it just happens to use intervals
of 8 while intervals of 4 seemed to work slightly better with luceneutil in
practice.
```
Benchmark Mode Cnt Score Error Units
AdvanceBenchmark.binarySearch thrpt 5 124.899 ± 8.926 ops/ms
AdvanceBenchmark.binarySearch2 thrpt 5 120.575 ± 6.757 ops/ms
AdvanceBenchmark.binarySearch3 thrpt 5 76.455 ± 10.387 ops/ms
AdvanceBenchmark.binarySearch4 thrpt 5 176.332 ± 10.384 ops/ms
AdvanceBenchmark.binarySearch5 thrpt 5 232.011 ± 8.920 ops/ms
AdvanceBenchmark.binarySearch6 thrpt 5 135.986 ± 2.720 ops/ms
AdvanceBenchmark.bruteForceSearch thrpt 5 81.400 ± 1.526 ops/ms
AdvanceBenchmark.hybridSearch thrpt 5 340.718 ± 4.698 ops/ms
AdvanceBenchmark.linearSearch thrpt 5 305.663 ± 3.478 ops/ms
AdvanceBenchmark.linearSearch2 thrpt 5 248.212 ± 1.494 ops/ms
AdvanceBenchmark.linearSearch3 thrpt 5 210.096 ± 10.088 ops/ms
```
And luceneutil on wikibigall:
- `OrHighHigh` and `OrStopWords` get a small slowdown, maybe `OrStopWords`
too,
- `OrHighMed`, `AndHighHigh`, `OrHighRare`, `HighTermDayOfYearSort`,
`OrHighLow`, `AndHighMed`, `CountAndHighMed` get a small speedup,
- `AndHighLow`, `OrNotHighLow`, `CountAndHighHigh` get a good speedup.
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
OrHighHigh 70.00 (2.0%) 67.82
(4.2%) -3.1% ( -9% - 3%) 0.003
OrStopWords 34.04 (3.3%) 33.05
(5.8%) -2.9% ( -11% - 6%) 0.053
PKLookup 281.09 (2.6%) 273.38
(2.6%) -2.7% ( -7% - 2%) 0.001
Or3Terms 172.60 (2.8%) 170.15
(3.6%) -1.4% ( -7% - 5%) 0.169
CountOrHighHigh 58.96 (13.5%) 58.24
(14.6%) -1.2% ( -25% - 31%) 0.782
Or2Terms2StopWords 162.94 (2.5%) 161.51
(3.9%) -0.9% ( -7% - 5%) 0.396
OrNotHighMed 355.53 (2.3%) 353.34
(2.8%) -0.6% ( -5% - 4%) 0.451
MedTerm 741.64 (4.6%) 738.88
(5.9%) -0.4% ( -10% - 10%) 0.824
HighTermTitleSort 124.03 (6.7%) 123.57
(2.4%) -0.4% ( -8% - 9%) 0.817
HighTerm 538.39 (3.8%) 536.57
(5.4%) -0.3% ( -9% - 9%) 0.818
CountOrHighMed 117.06 (11.1%) 116.81
(10.5%) -0.2% ( -19% - 23%) 0.952
AndStopWords 30.85 (3.3%) 30.86
(4.4%) 0.0% ( -7% - 8%) 0.969
LowTerm 1093.16 (4.2%) 1095.72
(6.2%) 0.2% ( -9% - 11%) 0.889
TermDTSort 364.51 (6.0%) 366.10
(5.8%) 0.4% ( -10% - 13%) 0.815
HighTermMonthSort 3150.57 (3.1%) 3166.16
(2.4%) 0.5% ( -4% - 6%) 0.574
OrNotHighHigh 235.15 (1.8%) 236.31
(3.3%) 0.5% ( -4% - 5%) 0.554
And3Terms 172.79 (2.9%) 174.13
(3.1%) 0.8% ( -5% - 6%) 0.409
OrHighNotHigh 253.75 (1.9%) 256.02
(3.8%) 0.9% ( -4% - 6%) 0.346
OrHighNotMed 413.79 (2.5%) 417.52
(4.1%) 0.9% ( -5% - 7%) 0.402
And2Terms2StopWords 160.11 (2.4%) 161.76
(3.2%) 1.0% ( -4% - 6%) 0.245
OrHighNotLow 489.29 (3.3%) 495.58
(4.6%) 1.3% ( -6% - 9%) 0.308
OrHighMed 242.22 (3.0%) 245.75
(2.5%) 1.5% ( -3% - 7%) 0.098
AndHighHigh 60.42 (1.5%) 61.32
(1.9%) 1.5% ( -1% - 4%) 0.004
CountTerm 9589.30 (3.0%) 9736.14
(4.3%) 1.5% ( -5% - 9%) 0.196
OrHighRare 270.80 (2.5%) 275.67
(3.3%) 1.8% ( -3% - 7%) 0.052
HighTermDayOfYearSort 826.36 (3.2%) 842.20
(3.5%) 1.9% ( -4% - 8%) 0.073
OrHighLow 869.82 (2.9%) 886.74
(2.4%) 1.9% ( -3% - 7%) 0.021
AndHighMed 187.97 (1.7%) 192.20
(1.5%) 2.2% ( 0% - 5%) 0.000
CountAndHighMed 146.76 (1.9%) 150.53
(2.2%) 2.6% ( -1% - 6%) 0.000
AndHighLow 965.91 (3.3%) 1008.42
(2.5%) 4.4% ( -1% - 10%) 0.000
OrNotHighLow 939.51 (4.1%) 987.88
(2.3%) 5.1% ( -1% - 11%) 0.000
CountAndHighHigh 48.08 (1.5%) 53.26
(4.3%) 10.8% ( 4% - 16%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]