jpountz commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2311845595
Here is what the `AdvanceBenchmark` reports. The branchless binary search is `binarySearch5`, which performs much faster than a regular binary search but still slower than a linear search in this benchmark that tries to reproduce the distribution I'm seeing in practice, where the first gte doc ID is usually only a few docs away. luceneutil also reports a slowdown if I wire this branchless binary search in. `hybridSearch` is the only approach I could come up with that would beat `linearSearch` in this benchmark, it just happens to use intervals of 8 while intervals of 4 seemed to work slightly better with luceneutil in practice. ``` Benchmark Mode Cnt Score Error Units AdvanceBenchmark.binarySearch thrpt 5 124.899 ± 8.926 ops/ms AdvanceBenchmark.binarySearch2 thrpt 5 120.575 ± 6.757 ops/ms AdvanceBenchmark.binarySearch3 thrpt 5 76.455 ± 10.387 ops/ms AdvanceBenchmark.binarySearch4 thrpt 5 176.332 ± 10.384 ops/ms AdvanceBenchmark.binarySearch5 thrpt 5 232.011 ± 8.920 ops/ms AdvanceBenchmark.binarySearch6 thrpt 5 135.986 ± 2.720 ops/ms AdvanceBenchmark.bruteForceSearch thrpt 5 81.400 ± 1.526 ops/ms AdvanceBenchmark.hybridSearch thrpt 5 340.718 ± 4.698 ops/ms AdvanceBenchmark.linearSearch thrpt 5 305.663 ± 3.478 ops/ms AdvanceBenchmark.linearSearch2 thrpt 5 248.212 ± 1.494 ops/ms AdvanceBenchmark.linearSearch3 thrpt 5 210.096 ± 10.088 ops/ms ``` And luceneutil on wikibigall: - `OrHighHigh` and `OrStopWords` get a small slowdown, maybe `OrStopWords` too, - `OrHighMed`, `AndHighHigh`, `OrHighRare`, `HighTermDayOfYearSort`, `OrHighLow`, `AndHighMed`, `CountAndHighMed` get a small speedup, - `AndHighLow`, `OrNotHighLow`, `CountAndHighHigh` get a good speedup. ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighHigh 70.00 (2.0%) 67.82 (4.2%) -3.1% ( -9% - 3%) 0.003 OrStopWords 34.04 (3.3%) 33.05 (5.8%) -2.9% ( -11% - 6%) 0.053 PKLookup 281.09 (2.6%) 273.38 (2.6%) -2.7% ( -7% - 2%) 0.001 Or3Terms 172.60 (2.8%) 170.15 (3.6%) -1.4% ( -7% - 5%) 0.169 CountOrHighHigh 58.96 (13.5%) 58.24 (14.6%) -1.2% ( -25% - 31%) 0.782 Or2Terms2StopWords 162.94 (2.5%) 161.51 (3.9%) -0.9% ( -7% - 5%) 0.396 OrNotHighMed 355.53 (2.3%) 353.34 (2.8%) -0.6% ( -5% - 4%) 0.451 MedTerm 741.64 (4.6%) 738.88 (5.9%) -0.4% ( -10% - 10%) 0.824 HighTermTitleSort 124.03 (6.7%) 123.57 (2.4%) -0.4% ( -8% - 9%) 0.817 HighTerm 538.39 (3.8%) 536.57 (5.4%) -0.3% ( -9% - 9%) 0.818 CountOrHighMed 117.06 (11.1%) 116.81 (10.5%) -0.2% ( -19% - 23%) 0.952 AndStopWords 30.85 (3.3%) 30.86 (4.4%) 0.0% ( -7% - 8%) 0.969 LowTerm 1093.16 (4.2%) 1095.72 (6.2%) 0.2% ( -9% - 11%) 0.889 TermDTSort 364.51 (6.0%) 366.10 (5.8%) 0.4% ( -10% - 13%) 0.815 HighTermMonthSort 3150.57 (3.1%) 3166.16 (2.4%) 0.5% ( -4% - 6%) 0.574 OrNotHighHigh 235.15 (1.8%) 236.31 (3.3%) 0.5% ( -4% - 5%) 0.554 And3Terms 172.79 (2.9%) 174.13 (3.1%) 0.8% ( -5% - 6%) 0.409 OrHighNotHigh 253.75 (1.9%) 256.02 (3.8%) 0.9% ( -4% - 6%) 0.346 OrHighNotMed 413.79 (2.5%) 417.52 (4.1%) 0.9% ( -5% - 7%) 0.402 And2Terms2StopWords 160.11 (2.4%) 161.76 (3.2%) 1.0% ( -4% - 6%) 0.245 OrHighNotLow 489.29 (3.3%) 495.58 (4.6%) 1.3% ( -6% - 9%) 0.308 OrHighMed 242.22 (3.0%) 245.75 (2.5%) 1.5% ( -3% - 7%) 0.098 AndHighHigh 60.42 (1.5%) 61.32 (1.9%) 1.5% ( -1% - 4%) 0.004 CountTerm 9589.30 (3.0%) 9736.14 (4.3%) 1.5% ( -5% - 9%) 0.196 OrHighRare 270.80 (2.5%) 275.67 (3.3%) 1.8% ( -3% - 7%) 0.052 HighTermDayOfYearSort 826.36 (3.2%) 842.20 (3.5%) 1.9% ( -4% - 8%) 0.073 OrHighLow 869.82 (2.9%) 886.74 (2.4%) 1.9% ( -3% - 7%) 0.021 AndHighMed 187.97 (1.7%) 192.20 (1.5%) 2.2% ( 0% - 5%) 0.000 CountAndHighMed 146.76 (1.9%) 150.53 (2.2%) 2.6% ( -1% - 6%) 0.000 AndHighLow 965.91 (3.3%) 1008.42 (2.5%) 4.4% ( -1% - 10%) 0.000 OrNotHighLow 939.51 (4.1%) 987.88 (2.3%) 5.1% ( -1% - 11%) 0.000 CountAndHighHigh 48.08 (1.5%) 53.26 (4.3%) 10.8% ( 4% - 16%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org