Adrien Grand created LUCENE-10634:
-------------------------------------
Summary: Speed up WANDScorer by computing scores before advancing
tail scorers
Key: LUCENE-10634
URL: https://issues.apache.org/jira/browse/LUCENE-10634
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
While looking at performance numbers on LUCENE-10480, I noticed that it is
often faster to compute a score in order to finer-grained estimation of the
best score that the current document can possibly get before advancing a tail
scorer.
Making this change to WANDScorer yielded a small but reproducible speedup:
{noformat}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
IntNRQ 186.50 (11.8%) 175.34
(19.1%) -6.0% ( -33% - 28%) 0.234
HighTermTitleBDVSort 167.27 (20.6%) 161.85
(17.2%) -3.2% ( -34% - 43%) 0.591
MedSloppyPhrase 194.77 (5.5%) 190.45
(7.8%) -2.2% ( -14% - 11%) 0.299
HighTermDayOfYearSort 229.61 (7.7%) 225.74
(7.1%) -1.7% ( -15% - 14%) 0.471
LowSloppyPhrase 20.22 (4.3%) 19.95
(4.8%) -1.3% ( -10% - 8%) 0.366
TermDTSort 319.62 (7.7%) 316.78
(7.5%) -0.9% ( -14% - 15%) 0.712
OrHighNotLow 1856.44 (5.6%) 1842.88
(5.7%) -0.7% ( -11% - 11%) 0.682
AndMedOrHighHigh 73.87 (3.8%) 73.51
(3.6%) -0.5% ( -7% - 7%) 0.677
OrHighNotHigh 2000.56 (5.6%) 1991.65
(6.9%) -0.4% ( -12% - 12%) 0.823
LowPhrase 106.90 (2.4%) 106.61
(2.9%) -0.3% ( -5% - 5%) 0.750
AndHighLow 1661.80 (3.5%) 1658.56
(3.7%) -0.2% ( -7% - 7%) 0.865
Fuzzy2 110.64 (1.8%) 110.43
(1.9%) -0.2% ( -3% - 3%) 0.752
HighTermMonthSort 73.74 (17.5%) 73.68
(20.8%) -0.1% ( -32% - 46%) 0.989
PKLookup 242.86 (1.8%) 242.75
(1.8%) -0.0% ( -3% - 3%) 0.934
OrHighNotMed 1454.98 (5.3%) 1456.26
(5.8%) 0.1% ( -10% - 11%) 0.960
HighPhrase 523.22 (2.9%) 524.01
(2.6%) 0.2% ( -5% - 5%) 0.862
MedPhrase 140.65 (2.7%) 140.87
(2.9%) 0.2% ( -5% - 5%) 0.862
HighSloppyPhrase 8.74 (4.6%) 8.75
(5.5%) 0.2% ( -9% - 10%) 0.914
LowSpanNear 28.05 (3.6%) 28.14
(3.0%) 0.3% ( -6% - 7%) 0.777
MedSpanNear 7.59 (3.5%) 7.61
(3.4%) 0.3% ( -6% - 7%) 0.778
Respell 67.62 (1.9%) 67.82
(1.8%) 0.3% ( -3% - 4%) 0.595
OrAndHigMedAndHighMed 127.87 (3.1%) 128.27
(4.0%) 0.3% ( -6% - 7%) 0.780
OrNotHighLow 1513.24 (2.1%) 1520.33
(2.6%) 0.5% ( -4% - 5%) 0.528
OrHighPhraseHighPhrase 25.26 (3.0%) 25.38
(3.0%) 0.5% ( -5% - 6%) 0.616
OrNotHighMed 1544.04 (4.5%) 1552.26
(4.2%) 0.5% ( -7% - 9%) 0.697
AndHighHigh 92.24 (4.8%) 92.79
(6.6%) 0.6% ( -10% - 12%) 0.744
AndHighMed 420.42 (3.1%) 423.19
(5.2%) 0.7% ( -7% - 9%) 0.624
Fuzzy1 117.42 (1.9%) 118.19
(2.2%) 0.7% ( -3% - 4%) 0.307
MedTerm 2209.36 (4.6%) 2224.54
(5.3%) 0.7% ( -8% - 11%) 0.661
MedIntervalsOrdered 124.18 (8.1%) 125.12
(8.0%) 0.8% ( -14% - 18%) 0.767
OrNotHighHigh 1239.43 (4.6%) 1249.63
(4.8%) 0.8% ( -8% - 10%) 0.580
AndHighOrMedMed 95.02 (4.3%) 95.82
(3.8%) 0.8% ( -6% - 9%) 0.515
Wildcard 315.22 (23.3%) 317.98
(22.5%) 0.9% ( -36% - 60%) 0.904
LowTerm 2775.81 (4.0%) 2808.32
(5.2%) 1.2% ( -7% - 10%) 0.425
HighIntervalsOrdered 14.24 (8.0%) 14.41
(8.4%) 1.2% ( -14% - 19%) 0.646
LowIntervalsOrdered 120.62 (5.8%) 122.09
(6.6%) 1.2% ( -10% - 14%) 0.534
HighSpanNear 39.04 (6.7%) 39.71
(4.3%) 1.7% ( -8% - 13%) 0.332
Prefix3 80.25 (5.1%) 81.70
(3.3%) 1.8% ( -6% - 10%) 0.187
HighTerm 3635.73 (6.0%) 3720.39
(6.5%) 2.3% ( -9% - 15%) 0.240
OrHighLow 860.22 (3.7%) 882.88
(3.4%) 2.6% ( -4% - 10%) 0.019
OrHighMed 91.61 (3.9%) 94.40
(4.1%) 3.1% ( -4% - 11%) 0.016
OrHighHigh 55.17 (3.7%) 57.09
(4.1%) 3.5% ( -4% - 11%) 0.005
OrHighMedMed 172.38 (5.0%) 178.92
(6.0%) 3.8% ( -6% - 15%) 0.029
OrHighHighMed 68.63 (4.5%) 72.66
(5.3%) 5.9% ( -3% - 16%) 0.000
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]