zacharymorn commented on pull request #81: URL: https://github.com/apache/lucene/pull/81#issuecomment-819252139
Hi Adrien, I've pushed up two additional commits with different changes, and run luceneutil to get multiple benchmark results: --- Commit : https://github.com/apache/lucene/pull/81/commits/13ce57b3715eed4fc56c249bb1442c35b93a1aeb Net changes being compared with baseline: 1. `DisjunctionSumScorer` uses `MaxScoreSumPropagator` for block max related logic 2. Uses `DisjunctionSumScorer` instead of `WANDScorer` when pure disjunction on term query Benchmark result 1: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighLow 685.16 (7.5%) 661.24 (6.8%) -3.5% ( -16% - 11%) 0.122 OrHighMed 206.80 (4.1%) 201.18 (4.8%) -2.7% ( -11% - 6%) 0.054 Fuzzy1 71.65 (11.6%) 69.85 (12.0%) -2.5% ( -23% - 23%) 0.502 BrowseDayOfYearSSDVFacets 27.19 (6.9%) 26.78 (8.2%) -1.5% ( -15% - 14%) 0.536 Wildcard 217.30 (5.6%) 214.22 (7.2%) -1.4% ( -13% - 12%) 0.489 Fuzzy2 23.01 (3.3%) 22.71 (2.7%) -1.3% ( -7% - 4%) 0.172 OrNotHighLow 1081.94 (4.4%) 1070.52 (3.7%) -1.1% ( -8% - 7%) 0.415 MedTerm 1473.41 (3.0%) 1458.42 (4.1%) -1.0% ( -7% - 6%) 0.367 PKLookup 214.52 (3.4%) 212.73 (3.0%) -0.8% ( -7% - 5%) 0.412 LowTerm 2037.07 (4.8%) 2020.31 (3.9%) -0.8% ( -9% - 8%) 0.551 Prefix3 271.77 (3.3%) 269.54 (4.7%) -0.8% ( -8% - 7%) 0.522 HighTerm 1249.12 (4.3%) 1238.91 (4.7%) -0.8% ( -9% - 8%) 0.565 OrHighHigh 45.71 (3.0%) 45.37 (2.5%) -0.7% ( -5% - 4%) 0.397 BrowseMonthTaxoFacets 13.06 (2.8%) 12.99 (2.8%) -0.6% ( -6% - 5%) 0.513 OrHighNotHigh 650.00 (5.5%) 646.61 (4.4%) -0.5% ( -9% - 9%) 0.740 BrowseDateTaxoFacets 10.85 (3.1%) 10.80 (3.5%) -0.5% ( -6% - 6%) 0.635 AndHighHigh 122.39 (3.3%) 121.81 (3.2%) -0.5% ( -6% - 6%) 0.648 Respell 79.32 (1.7%) 79.00 (1.2%) -0.4% ( -3% - 2%) 0.396 AndHighLow 792.61 (5.1%) 789.43 (4.5%) -0.4% ( -9% - 9%) 0.792 BrowseDayOfYearTaxoFacets 10.87 (3.0%) 10.83 (3.3%) -0.4% ( -6% - 6%) 0.694 LowSpanNear 76.74 (1.4%) 76.47 (1.6%) -0.3% ( -3% - 2%) 0.463 HighSloppyPhrase 34.99 (2.6%) 34.93 (3.2%) -0.2% ( -5% - 5%) 0.840 OrHighNotLow 721.14 (4.8%) 719.84 (5.4%) -0.2% ( -9% - 10%) 0.911 OrNotHighHigh 622.07 (3.5%) 621.05 (3.8%) -0.2% ( -7% - 7%) 0.888 HighTermTitleBDVSort 327.87 (16.8%) 327.34 (17.5%) -0.2% ( -29% - 41%) 0.976 HighSpanNear 49.03 (2.8%) 48.96 (2.9%) -0.2% ( -5% - 5%) 0.859 HighIntervalsOrdered 44.35 (1.8%) 44.28 (1.3%) -0.2% ( -3% - 3%) 0.757 LowSloppyPhrase 24.15 (2.4%) 24.13 (3.1%) -0.1% ( -5% - 5%) 0.914 AndHighMed 297.73 (3.6%) 297.71 (2.7%) -0.0% ( -6% - 6%) 0.995 MedSpanNear 64.27 (1.7%) 64.28 (1.5%) 0.0% ( -3% - 3%) 0.988 MedSloppyPhrase 36.84 (1.9%) 36.84 (2.8%) 0.0% ( -4% - 4%) 0.986 BrowseMonthSSDVFacets 31.09 (4.9%) 31.11 (2.3%) 0.1% ( -6% - 7%) 0.954 IntNRQ 188.61 (1.5%) 188.91 (1.7%) 0.2% ( -2% - 3%) 0.750 HighPhrase 386.99 (2.9%) 388.05 (3.4%) 0.3% ( -5% - 6%) 0.787 OrHighNotMed 729.88 (5.5%) 732.50 (5.7%) 0.4% ( -10% - 12%) 0.840 MedPhrase 380.97 (3.1%) 383.02 (3.1%) 0.5% ( -5% - 6%) 0.579 HighTermDayOfYearSort 332.45 (13.1%) 336.61 (13.1%) 1.2% ( -22% - 31%) 0.763 TermDTSort 364.87 (10.7%) 369.51 (12.9%) 1.3% ( -20% - 27%) 0.733 LowPhrase 403.73 (2.9%) 409.07 (3.2%) 1.3% ( -4% - 7%) 0.177 OrNotHighMed 611.71 (5.4%) 621.18 (4.0%) 1.5% ( -7% - 11%) 0.303 HighTermMonthSort 165.61 (9.8%) 171.22 (11.1%) 3.4% ( -15% - 26%) 0.306 ``` Benchmark result 2: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighHigh 46.46 (1.4%) 34.06 (2.2%) -26.7% ( -29% - -23%) 0.000 OrHighLow 620.75 (6.1%) 567.45 (8.2%) -8.6% ( -21% - 6%) 0.000 OrHighMed 170.06 (4.3%) 162.65 (5.3%) -4.4% ( -13% - 5%) 0.004 Prefix3 524.59 (4.3%) 512.98 (5.1%) -2.2% ( -11% - 7%) 0.136 Fuzzy1 91.31 (6.0%) 89.71 (9.0%) -1.8% ( -15% - 14%) 0.466 OrNotHighLow 1133.39 (4.5%) 1115.07 (4.9%) -1.6% ( -10% - 8%) 0.277 OrHighNotHigh 667.51 (5.2%) 660.80 (4.6%) -1.0% ( -10% - 9%) 0.518 AndHighLow 910.68 (3.6%) 901.61 (3.4%) -1.0% ( -7% - 6%) 0.372 HighTerm 1511.60 (4.0%) 1501.51 (4.1%) -0.7% ( -8% - 7%) 0.600 OrNotHighHigh 655.21 (4.0%) 651.54 (5.6%) -0.6% ( -9% - 9%) 0.718 LowTerm 2193.29 (5.6%) 2183.75 (4.9%) -0.4% ( -10% - 10%) 0.795 OrNotHighMed 789.32 (4.6%) 786.91 (4.2%) -0.3% ( -8% - 8%) 0.826 OrHighNotLow 822.79 (7.0%) 820.98 (5.6%) -0.2% ( -11% - 13%) 0.912 PKLookup 219.75 (4.8%) 219.28 (3.1%) -0.2% ( -7% - 8%) 0.868 BrowseDateTaxoFacets 11.14 (3.8%) 11.13 (3.3%) -0.1% ( -6% - 7%) 0.923 BrowseMonthTaxoFacets 13.50 (3.0%) 13.49 (3.0%) -0.1% ( -5% - 6%) 0.937 BrowseDayOfYearTaxoFacets 11.16 (3.9%) 11.16 (3.4%) -0.1% ( -7% - 7%) 0.958 LowSpanNear 309.94 (2.4%) 309.93 (2.9%) -0.0% ( -5% - 5%) 0.997 LowSloppyPhrase 49.18 (4.3%) 49.18 (4.1%) 0.0% ( -7% - 8%) 0.995 IntNRQ 160.83 (1.1%) 160.84 (1.0%) 0.0% ( -2% - 2%) 0.975 HighSloppyPhrase 49.88 (4.2%) 49.93 (4.0%) 0.1% ( -7% - 8%) 0.942 AndHighMed 290.03 (3.3%) 290.40 (3.4%) 0.1% ( -6% - 7%) 0.904 MedPhrase 373.01 (2.7%) 373.57 (2.9%) 0.2% ( -5% - 5%) 0.865 MedSpanNear 84.29 (2.2%) 84.43 (2.4%) 0.2% ( -4% - 4%) 0.817 HighIntervalsOrdered 6.08 (1.3%) 6.10 (1.0%) 0.2% ( -2% - 2%) 0.552 AndHighHigh 73.50 (4.3%) 73.67 (3.8%) 0.2% ( -7% - 8%) 0.854 Wildcard 234.24 (2.1%) 234.80 (1.6%) 0.2% ( -3% - 3%) 0.680 BrowseDayOfYearSSDVFacets 28.39 (5.4%) 28.48 (5.3%) 0.3% ( -9% - 11%) 0.846 HighPhrase 284.75 (2.0%) 285.69 (2.7%) 0.3% ( -4% - 5%) 0.662 MedSloppyPhrase 75.00 (2.4%) 75.33 (3.2%) 0.4% ( -5% - 6%) 0.628 Respell 95.48 (2.6%) 96.01 (1.7%) 0.6% ( -3% - 4%) 0.415 LowPhrase 129.78 (2.5%) 130.56 (1.8%) 0.6% ( -3% - 5%) 0.385 OrHighNotMed 726.46 (5.3%) 732.29 (4.6%) 0.8% ( -8% - 11%) 0.607 HighSpanNear 308.76 (2.6%) 312.61 (2.1%) 1.2% ( -3% - 6%) 0.098 MedTerm 1689.86 (5.7%) 1715.31 (5.1%) 1.5% ( -8% - 13%) 0.379 TermDTSort 377.99 (6.6%) 387.69 (4.6%) 2.6% ( -8% - 14%) 0.156 BrowseMonthSSDVFacets 31.65 (7.5%) 32.50 (2.3%) 2.7% ( -6% - 13%) 0.127 Fuzzy2 60.11 (14.4%) 61.79 (15.2%) 2.8% ( -23% - 37%) 0.552 HighTermDayOfYearSort 317.71 (13.5%) 329.57 (13.7%) 3.7% ( -20% - 35%) 0.385 HighTermMonthSort 304.28 (14.4%) 316.36 (14.9%) 4.0% ( -22% - 38%) 0.392 HighTermTitleBDVSort 194.65 (13.4%) 202.51 (16.2%) 4.0% ( -22% - 38%) 0.391 ``` Overall this seems to actually degrade the performance for certain OrHighHigh queries. However, given the original hypothesis was that BMM would be more effective than BMW for disjunction query with many (5+) clauses, I'm wondering if the benchmark query set needs to be augmented to test for this scenario, as most of them seems to be query with 1 or 2 clauses? --- Commit: https://github.com/apache/lucene/pull/81/commits/1d9255aecff22cd61f0cf3756f706acef3f24459 Net changes being compared with baseline: 1. `DisjunctionSumScorer` uses `MaxScoreSumPropagator` for block max related logic Benchmark result 1: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrNotHighMed 723.31 (7.1%) 702.22 (6.5%) -2.9% ( -15% - 11%) 0.176 LowTerm 1675.55 (8.4%) 1634.89 (6.1%) -2.4% ( -15% - 13%) 0.298 MedTerm 1470.24 (6.0%) 1438.45 (5.5%) -2.2% ( -12% - 9%) 0.234 LowPhrase 319.76 (3.7%) 314.75 (4.0%) -1.6% ( -8% - 6%) 0.195 AndHighMed 271.94 (4.5%) 267.76 (2.7%) -1.5% ( -8% - 5%) 0.189 AndHighLow 845.23 (5.5%) 832.50 (3.4%) -1.5% ( -9% - 7%) 0.301 IntNRQ 180.83 (4.9%) 178.25 (5.0%) -1.4% ( -10% - 8%) 0.366 BrowseMonthSSDVFacets 31.16 (1.5%) 30.77 (5.4%) -1.2% ( -8% - 5%) 0.318 OrHighNotHigh 628.85 (4.6%) 623.53 (6.2%) -0.8% ( -11% - 10%) 0.622 OrNotHighLow 1071.07 (2.8%) 1062.75 (4.0%) -0.8% ( -7% - 6%) 0.479 OrHighNotMed 692.64 (5.6%) 688.05 (5.7%) -0.7% ( -11% - 11%) 0.712 OrHighMed 129.14 (4.4%) 128.33 (3.5%) -0.6% ( -8% - 7%) 0.620 Respell 70.87 (5.2%) 70.48 (4.2%) -0.6% ( -9% - 9%) 0.711 OrHighLow 482.16 (5.5%) 479.66 (4.7%) -0.5% ( -10% - 10%) 0.750 OrNotHighHigh 779.58 (5.6%) 775.63 (5.8%) -0.5% ( -11% - 11%) 0.777 Wildcard 210.85 (5.4%) 209.91 (3.3%) -0.4% ( -8% - 8%) 0.752 AndHighHigh 169.04 (4.5%) 168.37 (1.8%) -0.4% ( -6% - 6%) 0.713 HighSpanNear 31.43 (2.3%) 31.32 (2.7%) -0.4% ( -5% - 4%) 0.655 BrowseDayOfYearSSDVFacets 27.49 (2.5%) 27.41 (1.8%) -0.3% ( -4% - 4%) 0.671 MedPhrase 48.70 (3.8%) 48.57 (3.8%) -0.3% ( -7% - 7%) 0.830 LowSpanNear 107.69 (2.3%) 107.46 (1.5%) -0.2% ( -3% - 3%) 0.735 PKLookup 209.57 (3.9%) 209.19 (3.0%) -0.2% ( -6% - 6%) 0.869 BrowseDayOfYearTaxoFacets 10.77 (2.7%) 10.76 (2.8%) -0.1% ( -5% - 5%) 0.907 Prefix3 180.75 (4.3%) 180.57 (5.3%) -0.1% ( -9% - 9%) 0.948 HighIntervalsOrdered 47.54 (1.7%) 47.51 (1.9%) -0.1% ( -3% - 3%) 0.926 BrowseDateTaxoFacets 10.75 (2.7%) 10.75 (2.8%) 0.0% ( -5% - 5%) 0.999 LowSloppyPhrase 18.25 (3.7%) 18.26 (3.7%) 0.1% ( -7% - 7%) 0.954 OrHighNotLow 676.54 (6.0%) 677.12 (5.9%) 0.1% ( -11% - 12%) 0.963 Fuzzy1 77.57 (9.9%) 77.68 (5.1%) 0.1% ( -13% - 16%) 0.955 MedSloppyPhrase 221.01 (4.5%) 221.37 (3.0%) 0.2% ( -7% - 8%) 0.895 HighTerm 997.87 (5.9%) 1000.31 (4.0%) 0.2% ( -9% - 10%) 0.878 BrowseMonthTaxoFacets 12.93 (4.1%) 12.96 (3.9%) 0.3% ( -7% - 8%) 0.837 MedSpanNear 254.30 (3.1%) 255.22 (3.6%) 0.4% ( -6% - 7%) 0.735 HighPhrase 262.59 (4.1%) 263.83 (4.2%) 0.5% ( -7% - 9%) 0.719 OrHighHigh 63.98 (3.6%) 64.34 (1.7%) 0.6% ( -4% - 6%) 0.526 HighSloppyPhrase 20.17 (5.8%) 20.29 (6.5%) 0.6% ( -11% - 13%) 0.746 TermDTSort 431.77 (12.0%) 437.67 (14.7%) 1.4% ( -22% - 31%) 0.747 HighTermTitleBDVSort 196.74 (14.0%) 201.01 (17.3%) 2.2% ( -25% - 38%) 0.663 HighTermDayOfYearSort 210.49 (10.4%) 215.84 (12.2%) 2.5% ( -18% - 28%) 0.478 HighTermMonthSort 191.43 (11.2%) 196.72 (11.9%) 2.8% ( -18% - 29%) 0.451 Fuzzy2 80.25 (22.5%) 84.12 (19.1%) 4.8% ( -30% - 59%) 0.464 ``` Benchmark result 2: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value TermDTSort 470.25 (12.5%) 458.13 (12.3%) -2.6% ( -24% - 25%) 0.511 OrNotHighHigh 665.88 (5.0%) 654.36 (5.5%) -1.7% ( -11% - 9%) 0.298 MedTerm 1607.39 (3.2%) 1581.79 (3.6%) -1.6% ( -8% - 5%) 0.137 BrowseDateTaxoFacets 11.20 (3.6%) 11.08 (4.0%) -1.1% ( -8% - 6%) 0.380 BrowseDayOfYearTaxoFacets 11.22 (3.6%) 11.10 (4.0%) -1.0% ( -8% - 6%) 0.409 MedSloppyPhrase 70.48 (3.4%) 69.80 (3.1%) -1.0% ( -7% - 5%) 0.344 HighTerm 1973.72 (5.5%) 1956.05 (4.2%) -0.9% ( -10% - 9%) 0.565 OrNotHighMed 671.95 (3.7%) 666.09 (3.5%) -0.9% ( -7% - 6%) 0.447 HighSloppyPhrase 16.48 (4.9%) 16.38 (5.3%) -0.6% ( -10% - 10%) 0.710 BrowseMonthTaxoFacets 13.17 (2.6%) 13.12 (2.8%) -0.4% ( -5% - 5%) 0.630 OrHighMed 126.70 (2.6%) 126.20 (3.1%) -0.4% ( -5% - 5%) 0.665 HighIntervalsOrdered 107.19 (2.8%) 106.94 (3.2%) -0.2% ( -6% - 5%) 0.808 PKLookup 214.60 (2.8%) 214.20 (2.5%) -0.2% ( -5% - 5%) 0.826 MedPhrase 311.36 (2.8%) 311.17 (2.8%) -0.1% ( -5% - 5%) 0.947 LowTerm 1678.94 (3.6%) 1678.44 (3.0%) -0.0% ( -6% - 6%) 0.977 LowPhrase 175.97 (2.8%) 175.99 (2.2%) 0.0% ( -4% - 5%) 0.991 HighPhrase 111.89 (2.4%) 111.92 (2.3%) 0.0% ( -4% - 4%) 0.977 IntNRQ 118.64 (1.8%) 118.72 (1.2%) 0.1% ( -2% - 3%) 0.890 HighTermTitleBDVSort 357.95 (13.7%) 358.35 (10.0%) 0.1% ( -20% - 27%) 0.977 OrHighNotHigh 757.69 (5.2%) 758.65 (3.9%) 0.1% ( -8% - 9%) 0.930 HighSpanNear 110.48 (2.5%) 110.62 (2.8%) 0.1% ( -5% - 5%) 0.874 BrowseMonthSSDVFacets 31.78 (6.1%) 31.83 (4.8%) 0.2% ( -10% - 11%) 0.928 OrHighHigh 44.44 (2.6%) 44.51 (2.1%) 0.2% ( -4% - 4%) 0.818 LowSloppyPhrase 160.78 (2.6%) 161.10 (2.9%) 0.2% ( -5% - 5%) 0.824 LowSpanNear 203.85 (2.5%) 204.30 (2.6%) 0.2% ( -4% - 5%) 0.782 Respell 83.36 (2.6%) 83.57 (1.9%) 0.2% ( -4% - 4%) 0.731 OrHighNotLow 748.89 (5.7%) 750.79 (5.3%) 0.3% ( -10% - 11%) 0.884 BrowseDayOfYearSSDVFacets 28.38 (1.6%) 28.45 (1.6%) 0.3% ( -2% - 3%) 0.605 OrNotHighLow 812.05 (5.4%) 814.66 (4.5%) 0.3% ( -9% - 10%) 0.837 HighTermMonthSort 269.01 (10.3%) 270.01 (9.5%) 0.4% ( -17% - 22%) 0.906 OrHighNotMed 789.92 (4.1%) 793.28 (5.8%) 0.4% ( -9% - 10%) 0.789 HighTermDayOfYearSort 242.21 (15.3%) 243.29 (14.5%) 0.4% ( -25% - 35%) 0.925 Prefix3 183.78 (3.0%) 184.64 (2.4%) 0.5% ( -4% - 6%) 0.588 MedSpanNear 50.84 (2.7%) 51.10 (2.8%) 0.5% ( -4% - 6%) 0.568 AndHighHigh 112.79 (3.1%) 113.39 (3.2%) 0.5% ( -5% - 7%) 0.595 AndHighLow 1030.25 (5.1%) 1036.81 (3.9%) 0.6% ( -7% - 10%) 0.658 Wildcard 173.40 (2.2%) 174.60 (1.9%) 0.7% ( -3% - 4%) 0.294 OrHighLow 525.51 (6.2%) 533.08 (5.4%) 1.4% ( -9% - 13%) 0.431 AndHighMed 395.02 (3.2%) 402.82 (2.9%) 2.0% ( -4% - 8%) 0.041 Fuzzy1 62.59 (11.0%) 64.03 (11.3%) 2.3% ( -17% - 27%) 0.514 Fuzzy2 56.62 (15.7%) 58.05 (13.7%) 2.5% ( -23% - 37%) 0.586 ``` Overall there's no significant improvement, presumably it's because with only changes in `DisjunctionSumScorer`, `WANDScorer` still handles all the queries with ScoreMode.TOP_SCORES for disjunction. What do you think about the changes and the benchmark results? Anything else I can try next? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org