jpountz commented on PR #13692:
URL: https://github.com/apache/lucene/pull/13692#issuecomment-2325294386
Other data points: if I bias towards the next 2 doc IDs rather than just the
next doc ID:
```java
static int findNextGEQ(long[] values, long target, int startIndex) {
if (values[startIndex + 1] >= target) {
int nextGEQIndex = startIndex;
if (values[startIndex] < target) {
nextGEQIndex += 1;
}
return nextGEQIndex;
}
int rangeStart = values.length - BINARY_SEARCH_WINDOW_SIZE;
for (int i = startIndex + 2; i + BINARY_SEARCH_WINDOW_SIZE <=
values.length; i += BINARY_SEARCH_WINDOW_SIZE) {
if (values[i + BINARY_SEARCH_WINDOW_SIZE - 1] >= target) {
rangeStart = i;
break;
}
}
return binarySearchHelper4(values, target, rangeStart);
}
```
then I get:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
IntNRQ 131.78 (20.7%) 119.07
(16.5%) -9.6% ( -38% - 34%) 0.102
CountOrHighHigh 61.04 (15.4%) 57.81
(14.9%) -5.3% ( -30% - 29%) 0.271
CountAndHighMed 145.50 (2.5%) 138.61
(1.6%) -4.7% ( -8% - 0%) 0.000
CountAndHighHigh 53.03 (2.9%) 50.91
(2.2%) -4.0% ( -8% - 1%) 0.000
HighTermMonthSort 3375.59 (2.4%) 3242.15
(3.1%) -4.0% ( -9% - 1%) 0.000
CountTerm 9320.33 (4.7%) 8979.03
(4.0%) -3.7% ( -11% - 5%) 0.008
OrHighNotLow 451.17 (3.2%) 435.64
(3.7%) -3.4% ( -10% - 3%) 0.002
CountOrHighMed 118.59 (11.4%) 114.84
(12.1%) -3.2% ( -23% - 22%) 0.394
HighTermTitleSort 129.22 (1.3%) 125.18
(5.5%) -3.1% ( -9% - 3%) 0.014
Prefix3 219.34 (3.0%) 212.77
(3.4%) -3.0% ( -9% - 3%) 0.003
TermDTSort 370.82 (4.5%) 360.20
(7.2%) -2.9% ( -13% - 9%) 0.130
Wildcard 94.29 (2.4%) 91.89
(3.3%) -2.5% ( -8% - 3%) 0.005
HighTermDayOfYearSort 843.34 (2.5%) 824.40
(3.7%) -2.2% ( -8% - 3%) 0.023
OrHighNotMed 364.91 (3.1%) 357.25
(3.2%) -2.1% ( -8% - 4%) 0.037
OrNotHighHigh 202.25 (3.6%) 199.64
(4.0%) -1.3% ( -8% - 6%) 0.279
OrHighNotHigh 242.63 (3.1%) 239.68
(3.4%) -1.2% ( -7% - 5%) 0.232
LowTerm 940.04 (2.4%) 931.00
(4.0%) -1.0% ( -7% - 5%) 0.356
HighTermTitleBDVSort 20.18 (5.8%) 20.15
(6.2%) -0.1% ( -11% - 12%) 0.950
OrNotHighMed 307.14 (3.0%) 307.11
(3.6%) -0.0% ( -6% - 6%) 0.994
MedTerm 698.19 (2.4%) 698.32
(3.1%) 0.0% ( -5% - 5%) 0.983
OrHighLow 798.92 (1.7%) 799.82
(1.3%) 0.1% ( -2% - 3%) 0.812
HighTerm 431.87 (2.9%) 433.31
(3.5%) 0.3% ( -5% - 6%) 0.745
OrNotHighLow 1030.05 (3.1%) 1042.88
(2.2%) 1.2% ( -3% - 6%) 0.144
AndHighLow 1039.02 (1.8%) 1053.58
(1.8%) 1.4% ( -2% - 5%) 0.012
And2Terms2StopWords 155.10 (2.8%) 157.55
(2.0%) 1.6% ( -3% - 6%) 0.041
AndHighMed 190.30 (1.4%) 193.43
(1.1%) 1.6% ( 0% - 4%) 0.000
AndHighHigh 70.14 (1.7%) 71.40
(1.4%) 1.8% ( -1% - 4%) 0.000
And3Terms 165.30 (3.1%) 168.58
(2.2%) 2.0% ( -3% - 7%) 0.019
PKLookup 277.99 (3.0%) 284.31
(2.1%) 2.3% ( -2% - 7%) 0.006
Or3Terms 164.11 (4.1%) 168.07
(2.9%) 2.4% ( -4% - 9%) 0.032
Or2Terms2StopWords 157.63 (3.9%) 161.63
(2.9%) 2.5% ( -4% - 9%) 0.020
OrHighMed 273.13 (1.6%) 280.64
(1.5%) 2.8% ( 0% - 5%) 0.000
OrHighHigh 63.38 (2.6%) 65.27
(1.9%) 3.0% ( -1% - 7%) 0.000
AndStopWords 29.98 (5.0%) 30.88
(4.0%) 3.0% ( -5% - 12%) 0.036
OrHighRare 270.67 (3.7%) 283.95
(1.9%) 4.9% ( 0% - 10%) 0.000
OrStopWords 32.93 (6.9%) 34.74
(5.2%) 5.5% ( -6% - 18%) 0.005
```
And if I remove the bias towards the next doc IDs and start checking every
4-th doc ID:
```java
static int findNextGEQ(long[] values, long target, int startIndex) {
int rangeStart = values.length - BINARY_SEARCH_WINDOW_SIZE;
for (int i = startIndex; i + BINARY_SEARCH_WINDOW_SIZE <= values.length;
i += BINARY_SEARCH_WINDOW_SIZE) {
if (values[i + BINARY_SEARCH_WINDOW_SIZE - 1] >= target) {
rangeStart = i;
break;
}
}
return binarySearchHelper4(values, target, rangeStart);
}
```
then I get
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
IntNRQ 188.99 (15.9%) 166.19
(3.4%) -12.1% ( -27% - 8%) 0.001
CountAndHighMed 146.07 (2.2%) 131.10
(1.0%) -10.3% ( -13% - -7%) 0.000
CountAndHighHigh 53.05 (2.4%) 49.33
(1.1%) -7.0% ( -10% - -3%) 0.000
Prefix3 81.47 (4.4%) 78.33
(3.5%) -3.9% ( -11% - 4%) 0.002
TermDTSort 371.25 (6.2%) 357.38
(7.0%) -3.7% ( -15% - 9%) 0.072
LowTerm 1125.19 (2.5%) 1084.03
(2.5%) -3.7% ( -8% - 1%) 0.000
CountOrHighMed 114.38 (11.3%) 110.28
(9.4%) -3.6% ( -21% - 19%) 0.273
CountTerm 9272.27 (3.2%) 8940.85
(3.6%) -3.6% ( -10% - 3%) 0.001
OrNotHighMed 307.33 (3.2%) 297.63
(3.3%) -3.2% ( -9% - 3%) 0.002
Wildcard 98.71 (2.6%) 95.88
(2.1%) -2.9% ( -7% - 1%) 0.000
HighTermMonthSort 3210.84 (2.5%) 3121.66
(2.4%) -2.8% ( -7% - 2%) 0.000
HighTermDayOfYearSort 866.32 (3.8%) 843.92
(4.4%) -2.6% ( -10% - 5%) 0.047
MedTerm 658.49 (3.1%) 642.37
(3.1%) -2.4% ( -8% - 3%) 0.012
OrHighNotLow 416.81 (3.5%) 407.42
(3.5%) -2.3% ( -8% - 4%) 0.043
OrNotHighHigh 225.61 (3.2%) 221.27
(3.7%) -1.9% ( -8% - 5%) 0.080
CountOrHighHigh 57.88 (16.0%) 56.82
(12.9%) -1.8% ( -26% - 32%) 0.691
HighTerm 477.85 (3.2%) 469.58
(3.1%) -1.7% ( -7% - 4%) 0.080
OrHighNotMed 399.01 (3.0%) 392.90
(3.0%) -1.5% ( -7% - 4%) 0.106
OrHighNotHigh 225.78 (3.0%) 223.21
(3.2%) -1.1% ( -7% - 5%) 0.240
HighTermTitleSort 151.06 (2.4%) 149.43
(4.5%) -1.1% ( -7% - 5%) 0.346
And3Terms 167.25 (1.2%) 165.82
(1.9%) -0.9% ( -3% - 2%) 0.089
OrHighLow 781.07 (1.5%) 776.45
(1.7%) -0.6% ( -3% - 2%) 0.246
AndHighHigh 62.97 (1.6%) 62.66
(1.1%) -0.5% ( -3% - 2%) 0.263
AndStopWords 30.58 (1.4%) 30.63
(2.8%) 0.2% ( -3% - 4%) 0.821
PKLookup 280.92 (2.4%) 281.73
(2.0%) 0.3% ( -4% - 4%) 0.681
Or3Terms 165.27 (1.4%) 165.96
(2.4%) 0.4% ( -3% - 4%) 0.507
And2Terms2StopWords 156.30 (1.3%) 156.97
(1.9%) 0.4% ( -2% - 3%) 0.408
HighTermTitleBDVSort 15.99 (5.6%) 16.07
(7.0%) 0.6% ( -11% - 13%) 0.783
OrNotHighLow 986.24 (2.3%) 992.51
(2.1%) 0.6% ( -3% - 5%) 0.362
AndHighMed 217.81 (1.5%) 219.37
(1.2%) 0.7% ( -1% - 3%) 0.099
Or2Terms2StopWords 159.61 (1.1%) 160.94
(2.4%) 0.8% ( -2% - 4%) 0.164
AndHighLow 1030.40 (2.6%) 1044.72
(1.9%) 1.4% ( -3% - 6%) 0.053
OrHighHigh 60.04 (2.8%) 61.33
(1.2%) 2.1% ( -1% - 6%) 0.002
OrStopWords 33.76 (2.1%) 34.53
(4.3%) 2.3% ( -4% - 8%) 0.033
OrHighMed 230.26 (2.3%) 236.51
(1.3%) 2.7% ( 0% - 6%) 0.000
OrHighRare 266.52 (3.5%) 283.60
(1.5%) 6.4% ( 1% - 11%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]