HUSTERGS opened a new pull request, #14735: URL: https://github.com/apache/lucene/pull/14735
### Description This PR propose to increase the step size of `findNextGEQ` to speed up adanving within a block, instead of force using AVX512, this implementation check two IntVector when target is hit. ### Benchmark result I ran the luceneutil `wikimediumall` corpus with two different configs all default config ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value FilteredOrHighMed 129.21 (10.3%) 121.31 (9.7%) -6.1% ( -23% - 15%) 0.053 CountOrHighHigh 110.26 (14.7%) 105.70 (13.2%) -4.1% ( -27% - 27%) 0.350 CountOrHighMed 133.31 (13.5%) 128.59 (11.5%) -3.5% ( -25% - 24%) 0.372 FilteredOrHighHigh 47.14 (7.6%) 45.61 (5.9%) -3.3% ( -15% - 11%) 0.130 AndHighOrMedMed 73.67 (11.0%) 71.74 (11.4%) -2.6% ( -22% - 22%) 0.457 IntervalsOrdered 36.73 (8.9%) 35.78 (8.3%) -2.6% ( -18% - 16%) 0.341 FilteredOr3Terms 78.48 (7.9%) 76.46 (9.9%) -2.6% ( -18% - 16%) 0.361 CountFilteredOrHighHigh 66.75 (6.5%) 65.09 (8.9%) -2.5% ( -16% - 13%) 0.314 CountAndHighMed 168.16 (14.5%) 164.67 (15.0%) -2.1% ( -27% - 31%) 0.656 FilteredOr2Terms2StopWords 62.18 (7.2%) 60.89 (7.9%) -2.1% ( -15% - 14%) 0.386 FilteredAndHighHigh 36.87 (7.2%) 36.11 (6.4%) -2.0% ( -14% - 12%) 0.344 FilteredOrStopWords 26.24 (4.0%) 25.78 (4.4%) -1.8% ( -9% - 6%) 0.181 CountAndHighHigh 127.60 (11.0%) 125.52 (13.9%) -1.6% ( -23% - 26%) 0.681 IntNRQ 94.06 (8.8%) 92.65 (11.7%) -1.5% ( -20% - 20%) 0.647 CountFilteredOrHighMed 83.79 (11.3%) 82.61 (9.4%) -1.4% ( -19% - 21%) 0.666 FilteredIntNRQ 276.48 (14.4%) 272.68 (12.6%) -1.4% ( -24% - 29%) 0.748 Term 334.73 (17.1%) 330.46 (17.7%) -1.3% ( -30% - 40%) 0.817 FilteredPhrase 32.38 (3.8%) 32.06 (5.4%) -1.0% ( -9% - 8%) 0.492 DismaxOrHighMed 90.22 (15.8%) 89.51 (15.5%) -0.8% ( -27% - 36%) 0.874 FilteredOrMany 12.59 (4.2%) 12.50 (3.2%) -0.7% ( -7% - 6%) 0.530 FilteredAndStopWords 24.18 (2.9%) 24.07 (4.4%) -0.5% ( -7% - 7%) 0.692 Phrase 75.02 (8.6%) 74.69 (8.9%) -0.4% ( -16% - 18%) 0.872 DismaxOrHighHigh 97.71 (12.4%) 97.30 (16.1%) -0.4% ( -25% - 32%) 0.925 Respell 25.95 (7.9%) 25.90 (7.3%) -0.2% ( -14% - 16%) 0.939 CountFilteredOrMany 19.39 (3.5%) 19.40 (3.9%) 0.0% ( -7% - 7%) 0.986 CombinedOrHighMed 25.89 (5.0%) 26.00 (3.7%) 0.4% ( -7% - 9%) 0.749 CountFilteredIntNRQ 53.60 (6.7%) 53.89 (8.4%) 0.5% ( -13% - 16%) 0.823 CountOrMany 21.13 (4.1%) 21.26 (3.1%) 0.6% ( -6% - 8%) 0.574 And3Terms 82.31 (17.4%) 82.98 (13.7%) 0.8% ( -25% - 38%) 0.868 CountPhrase 77.38 (10.1%) 78.10 (8.2%) 0.9% ( -15% - 21%) 0.751 TermB1M1P 436.52 (15.9%) 440.69 (13.9%) 1.0% ( -24% - 36%) 0.840 Prefix3 343.67 (12.8%) 347.40 (9.4%) 1.1% ( -18% - 26%) 0.760 FilteredTerm 96.16 (10.5%) 97.39 (5.6%) 1.3% ( -13% - 19%) 0.632 CombinedTerm 37.36 (6.2%) 37.85 (7.5%) 1.3% ( -11% - 16%) 0.548 CombinedOrHighHigh 14.21 (6.4%) 14.41 (7.9%) 1.4% ( -12% - 16%) 0.540 OrMany 11.84 (10.3%) 12.01 (10.3%) 1.4% ( -17% - 24%) 0.667 FilteredPrefix3 188.23 (11.0%) 191.36 (11.1%) 1.7% ( -18% - 26%) 0.634 SloppyPhrase 9.34 (6.7%) 9.50 (6.9%) 1.7% ( -11% - 16%) 0.427 Or2Terms2StopWords 98.50 (13.0%) 100.26 (12.6%) 1.8% ( -21% - 31%) 0.659 AndMedOrHighHigh 59.17 (9.0%) 60.29 (8.0%) 1.9% ( -13% - 20%) 0.484 CountFilteredPhrase 19.99 (3.9%) 20.50 (3.9%) 2.5% ( -5% - 10%) 0.040 Fuzzy2 38.40 (10.0%) 39.49 (13.0%) 2.8% ( -18% - 28%) 0.440 CombinedAndHighMed 65.98 (8.9%) 67.92 (8.2%) 2.9% ( -13% - 22%) 0.278 Fuzzy1 42.53 (7.9%) 43.83 (8.0%) 3.0% ( -11% - 20%) 0.226 TermMonthSort 522.43 (11.8%) 542.15 (11.4%) 3.8% ( -17% - 30%) 0.305 PKLookup 122.68 (7.7%) 127.40 (9.8%) 3.8% ( -12% - 23%) 0.168 Term1M 378.19 (17.8%) 392.87 (15.9%) 3.9% ( -25% - 45%) 0.468 OrHighHigh 42.53 (28.8%) 44.19 (23.9%) 3.9% ( -37% - 79%) 0.641 OrStopWords 39.01 (19.7%) 40.53 (22.7%) 3.9% ( -32% - 57%) 0.560 CountTerm 3260.09 (7.6%) 3389.74 (7.7%) 4.0% ( -10% - 20%) 0.102 Or3Terms 64.82 (16.5%) 67.58 (13.7%) 4.3% ( -22% - 41%) 0.373 TermTitleSort 36.25 (12.7%) 37.86 (13.4%) 4.5% ( -19% - 35%) 0.281 OrHighMed 79.76 (24.0%) 83.39 (23.7%) 4.6% ( -34% - 68%) 0.546 FilteredAnd2Terms2StopWords 47.78 (14.0%) 50.01 (11.3%) 4.7% ( -18% - 34%) 0.244 Term100 377.48 (11.7%) 397.62 (14.0%) 5.3% ( -18% - 35%) 0.191 OrHighRare 216.65 (15.2%) 228.37 (11.2%) 5.4% ( -18% - 37%) 0.201 CombinedAndHighHigh 9.31 (14.1%) 9.82 (12.6%) 5.5% ( -18% - 37%) 0.190 TermDTSort 71.24 (10.0%) 75.20 (11.2%) 5.6% ( -14% - 29%) 0.099 DismaxTerm 406.52 (11.7%) 429.63 (7.5%) 5.7% ( -12% - 28%) 0.068 And2Terms2StopWords 49.26 (14.5%) 52.24 (14.0%) 6.1% ( -19% - 40%) 0.179 SpanNear 10.09 (7.1%) 10.71 (6.8%) 6.1% ( -7% - 21%) 0.006 FilteredAnd3Terms 138.60 (15.6%) 147.87 (14.2%) 6.7% ( -19% - 43%) 0.156 Wildcard 99.31 (14.0%) 106.77 (13.8%) 7.5% ( -17% - 40%) 0.087 IntSet 205.88 (10.7%) 222.75 (17.1%) 8.2% ( -17% - 40%) 0.069 TermDayOfYearSort 82.67 (12.6%) 89.69 (13.9%) 8.5% ( -16% - 40%) 0.043 AndHighMed 98.55 (18.8%) 107.12 (13.1%) 8.7% ( -19% - 50%) 0.090 AndStopWords 41.09 (24.1%) 44.89 (17.4%) 9.3% ( -25% - 66%) 0.164 Term10K 412.49 (18.8%) 458.77 (13.1%) 11.2% ( -17% - 53%) 0.029 TermB1M 357.58 (17.5%) 401.24 (15.1%) 12.2% ( -17% - 54%) 0.018 FilteredAndHighMed 98.32 (15.3%) 110.66 (14.8%) 12.6% ( -15% - 50%) 0.008 AndHighHigh 43.61 (23.7%) 50.86 (21.4%) 16.6% ( -23% - 80%) 0.020 ``` with `-searchConcurrency 0` and set `taskCountPerCat` to 5 ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value FilteredOrMany 5.09 (1.8%) 4.91 (2.1%) -3.5% ( -7% - 0%) 0.000 FilteredTerm 90.26 (2.7%) 87.55 (2.4%) -3.0% ( -7% - 2%) 0.009 FilteredOrHighHigh 18.29 (1.4%) 17.77 (1.0%) -2.9% ( -5% - 0%) 0.000 FilteredOrHighMed 54.74 (1.3%) 53.58 (1.5%) -2.1% ( -4% - 0%) 0.001 FilteredOr3Terms 59.74 (1.2%) 58.48 (2.3%) -2.1% ( -5% - 1%) 0.010 CountAndHighMed 91.94 (2.3%) 90.00 (3.6%) -2.1% ( -7% - 3%) 0.115 TermMonthSort 2557.83 (4.7%) 2510.13 (2.6%) -1.9% ( -8% - 5%) 0.268 FilteredPhrase 13.08 (1.2%) 12.84 (1.3%) -1.8% ( -4% - 0%) 0.001 CountFilteredPhrase 11.95 (2.2%) 11.76 (2.5%) -1.6% ( -6% - 3%) 0.129 FilteredOr2Terms2StopWords 70.67 (1.0%) 69.58 (1.7%) -1.5% ( -4% - 1%) 0.016 FilteredOrStopWords 11.07 (1.4%) 10.90 (1.5%) -1.5% ( -4% - 1%) 0.015 Phrase 10.02 (1.6%) 9.90 (2.4%) -1.3% ( -5% - 2%) 0.169 CombinedAndHighHigh 6.26 (2.5%) 6.19 (3.7%) -1.2% ( -7% - 5%) 0.376 CountFilteredOrHighHigh 25.50 (1.8%) 25.24 (1.0%) -1.0% ( -3% - 1%) 0.115 FilteredAndHighHigh 12.87 (6.2%) 12.75 (5.9%) -1.0% ( -12% - 11%) 0.716 CombinedOrHighHigh 6.32 (2.6%) 6.26 (3.0%) -0.9% ( -6% - 4%) 0.450 CountFilteredOrHighMed 30.05 (2.0%) 29.83 (1.7%) -0.7% ( -4% - 2%) 0.375 CountFilteredIntNRQ 22.54 (3.0%) 22.37 (1.9%) -0.7% ( -5% - 4%) 0.521 CombinedAndHighMed 25.48 (2.7%) 25.32 (4.5%) -0.6% ( -7% - 6%) 0.705 CountAndHighHigh 61.76 (1.5%) 61.44 (1.4%) -0.5% ( -3% - 2%) 0.412 Wildcard 59.06 (3.8%) 58.77 (4.0%) -0.5% ( -7% - 7%) 0.779 Respell 44.94 (3.2%) 44.74 (2.4%) -0.5% ( -5% - 5%) 0.713 CountOrHighMed 96.16 (1.6%) 95.73 (3.0%) -0.5% ( -5% - 4%) 0.676 CountFilteredOrMany 6.10 (1.8%) 6.08 (2.2%) -0.4% ( -4% - 3%) 0.644 CountOrHighHigh 64.03 (1.8%) 63.78 (2.0%) -0.4% ( -4% - 3%) 0.650 FilteredPrefix3 94.25 (3.2%) 93.94 (3.2%) -0.3% ( -6% - 6%) 0.816 CountOrMany 6.85 (2.1%) 6.82 (2.3%) -0.3% ( -4% - 4%) 0.741 FilteredAndStopWords 10.20 (5.7%) 10.17 (5.3%) -0.3% ( -10% - 11%) 0.901 AndHighOrMedMed 18.21 (3.4%) 18.17 (3.5%) -0.2% ( -6% - 6%) 0.889 Fuzzy2 46.42 (4.4%) 46.37 (2.8%) -0.1% ( -6% - 7%) 0.954 IntervalsOrdered 2.99 (3.9%) 2.98 (2.9%) -0.0% ( -6% - 6%) 0.982 Prefix3 100.89 (3.0%) 100.88 (2.9%) -0.0% ( -5% - 6%) 0.994 Fuzzy1 51.43 (5.6%) 51.50 (3.8%) 0.1% ( -8% - 10%) 0.951 TermTitleSort 72.49 (3.7%) 72.61 (3.0%) 0.2% ( -6% - 7%) 0.914 CombinedOrHighMed 25.51 (3.1%) 25.56 (3.8%) 0.2% ( -6% - 7%) 0.900 IntSet 388.85 (5.6%) 389.85 (4.4%) 0.3% ( -9% - 10%) 0.909 PKLookup 186.01 (7.2%) 186.58 (8.4%) 0.3% ( -14% - 17%) 0.931 SpanNear 3.02 (5.0%) 3.03 (5.8%) 0.3% ( -9% - 11%) 0.899 FilteredIntNRQ 48.19 (2.5%) 48.35 (2.0%) 0.3% ( -4% - 5%) 0.758 OrHighRare 120.88 (5.5%) 121.28 (5.5%) 0.3% ( -10% - 11%) 0.892 CountPhrase 3.31 (3.8%) 3.32 (3.1%) 0.4% ( -6% - 7%) 0.794 CombinedTerm 13.88 (3.2%) 13.94 (4.9%) 0.4% ( -7% - 8%) 0.822 IntNRQ 48.37 (2.5%) 48.58 (1.8%) 0.4% ( -3% - 4%) 0.661 TermDayOfYearSort 381.01 (2.6%) 383.39 (1.5%) 0.6% ( -3% - 4%) 0.514 TermDTSort 181.20 (2.2%) 182.78 (2.5%) 0.9% ( -3% - 5%) 0.414 CountTerm 7148.05 (7.0%) 7230.16 (5.2%) 1.1% ( -10% - 14%) 0.676 SloppyPhrase 1.50 (4.2%) 1.54 (2.9%) 2.6% ( -4% - 10%) 0.109 OrMany 5.28 (12.6%) 5.42 (9.1%) 2.6% ( -16% - 27%) 0.599 AndMedOrHighHigh 19.49 (7.9%) 19.99 (6.5%) 2.6% ( -10% - 18%) 0.419 FilteredAndHighMed 36.56 (11.4%) 37.78 (7.5%) 3.4% ( -14% - 25%) 0.438 DismaxTerm 588.96 (11.4%) 610.47 (9.5%) 3.7% ( -15% - 27%) 0.437 DismaxOrHighHigh 42.39 (13.9%) 43.95 (11.0%) 3.7% ( -18% - 33%) 0.512 DismaxOrHighMed 61.95 (12.7%) 64.44 (10.8%) 4.0% ( -17% - 31%) 0.446 FilteredAnd2Terms2StopWords 64.02 (13.8%) 66.89 (9.4%) 4.5% ( -16% - 32%) 0.395 Term1M 525.96 (15.8%) 551.13 (13.3%) 4.8% ( -21% - 40%) 0.464 TermB1M1P 525.44 (16.0%) 552.18 (13.4%) 5.1% ( -21% - 41%) 0.442 Term10K 525.63 (15.7%) 552.56 (13.3%) 5.1% ( -20% - 40%) 0.431 Term 526.88 (16.2%) 554.16 (13.0%) 5.2% ( -20% - 40%) 0.430 FilteredAnd3Terms 76.81 (18.4%) 80.97 (13.8%) 5.4% ( -22% - 46%) 0.456 TermB1M 525.09 (15.7%) 554.15 (13.1%) 5.5% ( -20% - 40%) 0.392 Or2Terms2StopWords 68.38 (16.1%) 72.24 (11.8%) 5.7% ( -19% - 39%) 0.371 Term100 522.93 (15.7%) 553.49 (13.4%) 5.8% ( -20% - 41%) 0.370 Or3Terms 68.30 (20.0%) 72.34 (14.1%) 5.9% ( -23% - 49%) 0.444 And2Terms2StopWords 69.76 (15.6%) 74.00 (11.2%) 6.1% ( -17% - 38%) 0.317 OrStopWords 8.15 (21.2%) 8.66 (16.3%) 6.1% ( -25% - 55%) 0.468 AndStopWords 7.81 (18.5%) 8.29 (13.3%) 6.2% ( -21% - 46%) 0.391 And3Terms 72.63 (19.4%) 77.74 (13.9%) 7.0% ( -22% - 50%) 0.352 OrHighMed 76.89 (23.2%) 83.35 (18.4%) 8.4% ( -26% - 65%) 0.370 OrHighHigh 21.09 (24.9%) 23.01 (19.8%) 9.1% ( -28% - 71%) 0.366 AndHighMed 58.17 (22.6%) 63.81 (18.0%) 9.7% ( -25% - 64%) 0.288 AndHighHigh 21.72 (25.2%) 24.07 (20.8%) 10.8% ( -28% - 75%) 0.295 ``` ### Findings Initially, I was playing around with `findNextGEQ` and `AdvanceBenchmark`, try to figure out if there is any space for optimization, and finnaly found I can increase the step size and manually expand the loop. In my machine, expand the loop from 1 to 4 will produce different resut (4 is best) ``` AdvanceBenchmark.vectorUtilSearch thrpt 15 752.037 ± 33.398 ops/ms (no expand, current implementation) AdvanceBenchmark.vectorUtilSearch thrpt 15 625.892 ± 20.849 ops/ms (expand 2, proposed implementation by this PR) AdvanceBenchmark.vectorUtilSearch thrpt 15 893.295 ± 17.879 ops/ms (expand 3) AdvanceBenchmark.vectorUtilSearch thrpt 15 955.528 ± 20.036 ops/ms (expand 4) ``` so I modify the code and ran luceneutil, but the result is not good as `AdvanceBenchmark` showed. At first, I just run the expand-3/4, because only 3/4 have better performance than the current implementation. This might indicate the `AdvanceBenchmark` have some gap with real word search path. To be honest, the result is not ***THAT*** reliable to me, because the `p-value` is still far from 0, but on the other hand, this modification actually paid more cost when the target is close, because we need to check two vectors now, instead of one, so the high `p-value` might be acceptable? Please let me know if I got anything wrong. :) <!-- If this is your first contribution to Lucene, please make sure you have reviewed the contribution guide. https://github.com/apache/lucene/blob/main/CONTRIBUTING.md --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org