jpountz commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2325294386
Other data points: if I bias towards the next 2 doc IDs rather than just the next doc ID: ```java static int findNextGEQ(long[] values, long target, int startIndex) { if (values[startIndex + 1] >= target) { int nextGEQIndex = startIndex; if (values[startIndex] < target) { nextGEQIndex += 1; } return nextGEQIndex; } int rangeStart = values.length - BINARY_SEARCH_WINDOW_SIZE; for (int i = startIndex + 2; i + BINARY_SEARCH_WINDOW_SIZE <= values.length; i += BINARY_SEARCH_WINDOW_SIZE) { if (values[i + BINARY_SEARCH_WINDOW_SIZE - 1] >= target) { rangeStart = i; break; } } return binarySearchHelper4(values, target, rangeStart); } ``` then I get: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value IntNRQ 131.78 (20.7%) 119.07 (16.5%) -9.6% ( -38% - 34%) 0.102 CountOrHighHigh 61.04 (15.4%) 57.81 (14.9%) -5.3% ( -30% - 29%) 0.271 CountAndHighMed 145.50 (2.5%) 138.61 (1.6%) -4.7% ( -8% - 0%) 0.000 CountAndHighHigh 53.03 (2.9%) 50.91 (2.2%) -4.0% ( -8% - 1%) 0.000 HighTermMonthSort 3375.59 (2.4%) 3242.15 (3.1%) -4.0% ( -9% - 1%) 0.000 CountTerm 9320.33 (4.7%) 8979.03 (4.0%) -3.7% ( -11% - 5%) 0.008 OrHighNotLow 451.17 (3.2%) 435.64 (3.7%) -3.4% ( -10% - 3%) 0.002 CountOrHighMed 118.59 (11.4%) 114.84 (12.1%) -3.2% ( -23% - 22%) 0.394 HighTermTitleSort 129.22 (1.3%) 125.18 (5.5%) -3.1% ( -9% - 3%) 0.014 Prefix3 219.34 (3.0%) 212.77 (3.4%) -3.0% ( -9% - 3%) 0.003 TermDTSort 370.82 (4.5%) 360.20 (7.2%) -2.9% ( -13% - 9%) 0.130 Wildcard 94.29 (2.4%) 91.89 (3.3%) -2.5% ( -8% - 3%) 0.005 HighTermDayOfYearSort 843.34 (2.5%) 824.40 (3.7%) -2.2% ( -8% - 3%) 0.023 OrHighNotMed 364.91 (3.1%) 357.25 (3.2%) -2.1% ( -8% - 4%) 0.037 OrNotHighHigh 202.25 (3.6%) 199.64 (4.0%) -1.3% ( -8% - 6%) 0.279 OrHighNotHigh 242.63 (3.1%) 239.68 (3.4%) -1.2% ( -7% - 5%) 0.232 LowTerm 940.04 (2.4%) 931.00 (4.0%) -1.0% ( -7% - 5%) 0.356 HighTermTitleBDVSort 20.18 (5.8%) 20.15 (6.2%) -0.1% ( -11% - 12%) 0.950 OrNotHighMed 307.14 (3.0%) 307.11 (3.6%) -0.0% ( -6% - 6%) 0.994 MedTerm 698.19 (2.4%) 698.32 (3.1%) 0.0% ( -5% - 5%) 0.983 OrHighLow 798.92 (1.7%) 799.82 (1.3%) 0.1% ( -2% - 3%) 0.812 HighTerm 431.87 (2.9%) 433.31 (3.5%) 0.3% ( -5% - 6%) 0.745 OrNotHighLow 1030.05 (3.1%) 1042.88 (2.2%) 1.2% ( -3% - 6%) 0.144 AndHighLow 1039.02 (1.8%) 1053.58 (1.8%) 1.4% ( -2% - 5%) 0.012 And2Terms2StopWords 155.10 (2.8%) 157.55 (2.0%) 1.6% ( -3% - 6%) 0.041 AndHighMed 190.30 (1.4%) 193.43 (1.1%) 1.6% ( 0% - 4%) 0.000 AndHighHigh 70.14 (1.7%) 71.40 (1.4%) 1.8% ( -1% - 4%) 0.000 And3Terms 165.30 (3.1%) 168.58 (2.2%) 2.0% ( -3% - 7%) 0.019 PKLookup 277.99 (3.0%) 284.31 (2.1%) 2.3% ( -2% - 7%) 0.006 Or3Terms 164.11 (4.1%) 168.07 (2.9%) 2.4% ( -4% - 9%) 0.032 Or2Terms2StopWords 157.63 (3.9%) 161.63 (2.9%) 2.5% ( -4% - 9%) 0.020 OrHighMed 273.13 (1.6%) 280.64 (1.5%) 2.8% ( 0% - 5%) 0.000 OrHighHigh 63.38 (2.6%) 65.27 (1.9%) 3.0% ( -1% - 7%) 0.000 AndStopWords 29.98 (5.0%) 30.88 (4.0%) 3.0% ( -5% - 12%) 0.036 OrHighRare 270.67 (3.7%) 283.95 (1.9%) 4.9% ( 0% - 10%) 0.000 OrStopWords 32.93 (6.9%) 34.74 (5.2%) 5.5% ( -6% - 18%) 0.005 ``` And if I remove the bias towards the next doc IDs and start checking every 4-th doc ID: ```java static int findNextGEQ(long[] values, long target, int startIndex) { int rangeStart = values.length - BINARY_SEARCH_WINDOW_SIZE; for (int i = startIndex; i + BINARY_SEARCH_WINDOW_SIZE <= values.length; i += BINARY_SEARCH_WINDOW_SIZE) { if (values[i + BINARY_SEARCH_WINDOW_SIZE - 1] >= target) { rangeStart = i; break; } } return binarySearchHelper4(values, target, rangeStart); } ``` then I get ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value IntNRQ 188.99 (15.9%) 166.19 (3.4%) -12.1% ( -27% - 8%) 0.001 CountAndHighMed 146.07 (2.2%) 131.10 (1.0%) -10.3% ( -13% - -7%) 0.000 CountAndHighHigh 53.05 (2.4%) 49.33 (1.1%) -7.0% ( -10% - -3%) 0.000 Prefix3 81.47 (4.4%) 78.33 (3.5%) -3.9% ( -11% - 4%) 0.002 TermDTSort 371.25 (6.2%) 357.38 (7.0%) -3.7% ( -15% - 9%) 0.072 LowTerm 1125.19 (2.5%) 1084.03 (2.5%) -3.7% ( -8% - 1%) 0.000 CountOrHighMed 114.38 (11.3%) 110.28 (9.4%) -3.6% ( -21% - 19%) 0.273 CountTerm 9272.27 (3.2%) 8940.85 (3.6%) -3.6% ( -10% - 3%) 0.001 OrNotHighMed 307.33 (3.2%) 297.63 (3.3%) -3.2% ( -9% - 3%) 0.002 Wildcard 98.71 (2.6%) 95.88 (2.1%) -2.9% ( -7% - 1%) 0.000 HighTermMonthSort 3210.84 (2.5%) 3121.66 (2.4%) -2.8% ( -7% - 2%) 0.000 HighTermDayOfYearSort 866.32 (3.8%) 843.92 (4.4%) -2.6% ( -10% - 5%) 0.047 MedTerm 658.49 (3.1%) 642.37 (3.1%) -2.4% ( -8% - 3%) 0.012 OrHighNotLow 416.81 (3.5%) 407.42 (3.5%) -2.3% ( -8% - 4%) 0.043 OrNotHighHigh 225.61 (3.2%) 221.27 (3.7%) -1.9% ( -8% - 5%) 0.080 CountOrHighHigh 57.88 (16.0%) 56.82 (12.9%) -1.8% ( -26% - 32%) 0.691 HighTerm 477.85 (3.2%) 469.58 (3.1%) -1.7% ( -7% - 4%) 0.080 OrHighNotMed 399.01 (3.0%) 392.90 (3.0%) -1.5% ( -7% - 4%) 0.106 OrHighNotHigh 225.78 (3.0%) 223.21 (3.2%) -1.1% ( -7% - 5%) 0.240 HighTermTitleSort 151.06 (2.4%) 149.43 (4.5%) -1.1% ( -7% - 5%) 0.346 And3Terms 167.25 (1.2%) 165.82 (1.9%) -0.9% ( -3% - 2%) 0.089 OrHighLow 781.07 (1.5%) 776.45 (1.7%) -0.6% ( -3% - 2%) 0.246 AndHighHigh 62.97 (1.6%) 62.66 (1.1%) -0.5% ( -3% - 2%) 0.263 AndStopWords 30.58 (1.4%) 30.63 (2.8%) 0.2% ( -3% - 4%) 0.821 PKLookup 280.92 (2.4%) 281.73 (2.0%) 0.3% ( -4% - 4%) 0.681 Or3Terms 165.27 (1.4%) 165.96 (2.4%) 0.4% ( -3% - 4%) 0.507 And2Terms2StopWords 156.30 (1.3%) 156.97 (1.9%) 0.4% ( -2% - 3%) 0.408 HighTermTitleBDVSort 15.99 (5.6%) 16.07 (7.0%) 0.6% ( -11% - 13%) 0.783 OrNotHighLow 986.24 (2.3%) 992.51 (2.1%) 0.6% ( -3% - 5%) 0.362 AndHighMed 217.81 (1.5%) 219.37 (1.2%) 0.7% ( -1% - 3%) 0.099 Or2Terms2StopWords 159.61 (1.1%) 160.94 (2.4%) 0.8% ( -2% - 4%) 0.164 AndHighLow 1030.40 (2.6%) 1044.72 (1.9%) 1.4% ( -3% - 6%) 0.053 OrHighHigh 60.04 (2.8%) 61.33 (1.2%) 2.1% ( -1% - 6%) 0.002 OrStopWords 33.76 (2.1%) 34.53 (4.3%) 2.3% ( -4% - 8%) 0.033 OrHighMed 230.26 (2.3%) 236.51 (1.3%) 2.7% ( 0% - 6%) 0.000 OrHighRare 266.52 (3.5%) 283.60 (1.5%) 6.4% ( 1% - 11%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org