jpountz opened a new pull request, #13958:
URL: https://github.com/apache/lucene/pull/13958
PR #13692 tried to speed up advancing by using branchless binary search, but
while this yielded a speedup on my machine, this yielded a slowdown on nightly
benchmarks.
This PR tries a different approach using vectorization. Experimentation
suggests that it slows down a bit queries when advancing often goes to the very
next doc ID, such as term queries and `OrHighNotXXX` tasks. But it speeds up
queries that advance to the next few doc IDs, such as `AndHighHigh`. I think
that this is a good trade-off since it slows down some plenty fast queries in
exchange for a speedup with some more expensive queries.
Here is a `luceneutil` run on `wikibigall` with `-searchConcurrency 0`:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
OrHighNotHigh 302.78 (2.4%) 283.75
(2.9%) -6.3% ( -11% - -1%) 0.000
OrHighNotMed 384.69 (3.0%) 363.33
(2.8%) -5.6% ( -10% - 0%) 0.000
MedTerm 564.86 (2.2%) 537.04
(3.5%) -4.9% ( -10% - 0%) 0.000
LowTerm 1014.02 (2.2%) 967.37
(3.6%) -4.6% ( -10% - 1%) 0.000
OrHighNotLow 446.38 (3.4%) 427.10
(3.3%) -4.3% ( -10% - 2%) 0.000
HighTerm 485.41 (1.9%) 464.49
(3.2%) -4.3% ( -9% - 0%) 0.000
OrNotHighHigh 229.78 (2.4%) 221.51
(3.1%) -3.6% ( -8% - 1%) 0.000
OrNotHighMed 396.63 (2.7%) 382.41
(3.1%) -3.6% ( -9% - 2%) 0.000
Prefix3 145.65 (3.6%) 142.39
(3.7%) -2.2% ( -9% - 5%) 0.051
IntNRQ 158.04 (4.7%) 154.77
(5.6%) -2.1% ( -11% - 8%) 0.205
CountTerm 8320.96 (3.2%) 8198.56
(4.7%) -1.5% ( -9% - 6%) 0.246
PKLookup 273.35 (3.6%) 269.71
(5.2%) -1.3% ( -9% - 7%) 0.345
Wildcard 83.30 (3.4%) 82.28
(3.1%) -1.2% ( -7% - 5%) 0.234
HighTermMonthSort 3235.98 (3.1%) 3198.04
(2.9%) -1.2% ( -6% - 4%) 0.215
HighTermTitleSort 148.94 (2.5%) 148.38
(2.6%) -0.4% ( -5% - 4%) 0.638
CountOrHighMed 104.51 (2.0%) 104.22
(1.7%) -0.3% ( -3% - 3%) 0.640
HighTermTitleBDVSort 14.67 (5.3%) 14.64
(5.9%) -0.2% ( -10% - 11%) 0.899
AndStopWords 30.68 (3.0%) 30.66
(2.7%) -0.1% ( -5% - 5%) 0.941
CountOrHighHigh 50.17 (2.0%) 50.19
(1.9%) 0.0% ( -3% - 3%) 0.947
OrHighRare 273.82 (4.5%) 273.96
(3.8%) 0.0% ( -7% - 8%) 0.971
TermDTSort 353.37 (6.4%) 354.23
(6.7%) 0.2% ( -12% - 14%) 0.907
Fuzzy1 77.85 (2.6%) 78.12
(2.0%) 0.3% ( -4% - 4%) 0.633
Fuzzy2 73.23 (2.5%) 73.50
(1.9%) 0.4% ( -3% - 4%) 0.594
HighTermDayOfYearSort 836.62 (3.1%) 841.07
(4.0%) 0.5% ( -6% - 7%) 0.639
And2Terms2StopWords 154.49 (1.8%) 155.41
(2.1%) 0.6% ( -3% - 4%) 0.340
OrHighLow 771.90 (2.0%) 778.20
(2.2%) 0.8% ( -3% - 5%) 0.217
And3Terms 167.63 (2.3%) 169.23
(2.2%) 1.0% ( -3% - 5%) 0.176
OrStopWords 33.99 (4.6%) 34.39
(4.1%) 1.2% ( -7% - 10%) 0.388
CountAndHighMed 148.01 (2.4%) 149.91
(1.0%) 1.3% ( -2% - 4%) 0.025
Or2Terms2StopWords 156.93 (2.8%) 159.21
(3.0%) 1.5% ( -4% - 7%) 0.117
AndHighHigh 67.06 (1.3%) 68.07
(1.6%) 1.5% ( -1% - 4%) 0.001
OrMany 18.67 (2.9%) 18.96
(2.9%) 1.5% ( -4% - 7%) 0.089
AndHighMed 185.02 (1.6%) 189.06
(1.3%) 2.2% ( 0% - 5%) 0.000
AndHighLow 948.34 (2.6%) 970.47
(2.6%) 2.3% ( -2% - 7%) 0.004
OrHighHigh 68.42 (1.4%) 70.08
(1.3%) 2.4% ( 0% - 5%) 0.000
Or3Terms 166.47 (2.7%) 171.10
(3.1%) 2.8% ( -2% - 8%) 0.003
OrNotHighLow 964.69 (3.1%) 994.46
(3.3%) 3.1% ( -3% - 9%) 0.002
OrHighMed 222.32 (2.1%) 230.93
(1.5%) 3.9% ( 0% - 7%) 0.000
CountAndHighHigh 48.88 (2.4%) 52.87
(1.3%) 8.2% ( 4% - 12%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]