zacharymorn commented on PR #972: URL: https://github.com/apache/lucene/pull/972#issuecomment-1162613846
Hi @jpountz , I have adapted the original BMM PR https://github.com/apache/lucene/pull/101 to the latest codebase and run further experiments on using it for 2 clauses disjunction. The results look both encouraging and strange :D When I run `python3 src/python/localrun.py -source wikimedium10m` with only `OrHighLow`, `OrHighHigh` and `OrHighMed` tasks from ` tasks/wikimedium.10M.nostopwords.tasks tasks/wikimedium.10M.nostopwords.tasks` (by removing the other tasks), I got pretty impressive speedup on average: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 173.31 (24.6%) 181.79 (26.8%) 4.9% ( -37% - 74%) 0.547 OrHighLow 166.70 (62.8%) 385.94 (101.5%) 131.5% ( -20% - 794%) 0.000 OrHighHigh 9.27 (48.9%) 23.44 (85.9%) 152.9% ( 12% - 562%) 0.000 OrHighMed 18.45 (61.3%) 55.92 (137.3%) 203.0% ( 2% - 1037%) 0.000 ``` However, when I run all the tasks, `OrHighLow`, `OrHighHigh` and `OrHighMed` have only moderate speedup on average and sometimes even slightly negatively impacted: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighHigh 35.23 (7.2%) 23.86 (7.0%) -32.3% ( -43% - -19%) 0.000 OrHighLow 898.97 (4.4%) 788.65 (4.2%) -12.3% ( -20% - -3%) 0.000 BrowseDateSSDVFacets 2.62 (27.0%) 2.43 (18.8%) -7.4% ( -41% - 52%) 0.312 HighSpanNear 21.86 (6.4%) 21.00 (6.1%) -4.0% ( -15% - 9%) 0.045 Fuzzy2 94.11 (12.4%) 90.59 (9.8%) -3.7% ( -23% - 21%) 0.290 LowSloppyPhrase 65.63 (8.2%) 63.99 (8.6%) -2.5% ( -17% - 15%) 0.347 HighSloppyPhrase 17.25 (5.3%) 16.84 (5.3%) -2.4% ( -12% - 8%) 0.154 TermDTSort 160.18 (8.2%) 156.49 (9.9%) -2.3% ( -18% - 17%) 0.423 HighTermDayOfYearSort 164.86 (6.8%) 161.77 (10.1%) -1.9% ( -17% - 16%) 0.490 OrHighMedDayTaxoFacets 11.05 (7.1%) 10.86 (7.3%) -1.7% ( -15% - 13%) 0.465 AndHighLow 1482.47 (4.0%) 1459.63 (10.6%) -1.5% ( -15% - 13%) 0.544 MedSpanNear 27.77 (7.2%) 27.49 (6.1%) -1.0% ( -13% - 13%) 0.628 HighTermTitleBDVSort 197.53 (7.4%) 195.53 (6.3%) -1.0% ( -13% - 13%) 0.640 AndHighMedDayTaxoFacets 43.61 (8.7%) 43.19 (10.1%) -1.0% ( -18% - 19%) 0.745 HighIntervalsOrdered 17.38 (8.7%) 17.26 (7.5%) -0.7% ( -15% - 16%) 0.782 HighPhrase 454.15 (5.0%) 451.67 (8.7%) -0.5% ( -13% - 13%) 0.807 BrowseRandomLabelSSDVFacets 15.40 (8.1%) 15.32 (7.3%) -0.5% ( -14% - 16%) 0.837 AndHighHighDayTaxoFacets 16.94 (7.0%) 16.87 (6.6%) -0.5% ( -13% - 14%) 0.834 LowSpanNear 9.08 (4.8%) 9.05 (4.3%) -0.3% ( -9% - 9%) 0.838 Wildcard 55.15 (11.3%) 55.01 (12.0%) -0.2% ( -21% - 26%) 0.947 MedPhrase 976.56 (2.8%) 977.29 (3.3%) 0.1% ( -5% - 6%) 0.939 MedTermDayTaxoFacets 77.21 (8.6%) 77.46 (8.7%) 0.3% ( -15% - 19%) 0.908 OrNotHighLow 1187.34 (5.1%) 1191.80 (5.3%) 0.4% ( -9% - 11%) 0.819 OrHighNotHigh 1556.42 (4.4%) 1566.26 (4.5%) 0.6% ( -7% - 9%) 0.654 LowIntervalsOrdered 158.96 (6.4%) 160.03 (8.9%) 0.7% ( -13% - 17%) 0.785 OrNotHighHigh 1427.22 (3.8%) 1436.97 (5.0%) 0.7% ( -7% - 9%) 0.628 Fuzzy1 116.55 (11.4%) 117.41 (9.4%) 0.7% ( -18% - 24%) 0.823 LowTerm 3470.46 (5.9%) 3500.25 (5.9%) 0.9% ( -10% - 13%) 0.644 HighTermMonthSort 169.22 (10.4%) 170.68 (14.9%) 0.9% ( -22% - 29%) 0.832 IntNRQ 115.77 (22.6%) 116.95 (21.3%) 1.0% ( -34% - 57%) 0.883 MedTerm 3042.06 (4.5%) 3080.17 (5.4%) 1.3% ( -8% - 11%) 0.427 HighTerm 2407.19 (5.5%) 2440.56 (4.1%) 1.4% ( -7% - 11%) 0.369 Prefix3 396.92 (10.2%) 403.19 (8.6%) 1.6% ( -15% - 22%) 0.595 OrNotHighMed 1695.31 (3.6%) 1722.43 (5.5%) 1.6% ( -7% - 11%) 0.274 MedSloppyPhrase 13.19 (4.5%) 13.40 (5.0%) 1.6% ( -7% - 11%) 0.283 OrHighNotLow 1473.94 (6.7%) 1500.95 (6.6%) 1.8% ( -10% - 16%) 0.383 AndHighMed 201.69 (4.5%) 205.65 (9.1%) 2.0% ( -11% - 16%) 0.387 PKLookup 247.69 (11.3%) 253.24 (9.6%) 2.2% ( -16% - 26%) 0.499 MedIntervalsOrdered 30.40 (8.1%) 31.13 (7.7%) 2.4% ( -12% - 19%) 0.338 OrHighNotMed 1534.55 (4.5%) 1571.83 (3.9%) 2.4% ( -5% - 11%) 0.068 Respell 90.55 (7.9%) 92.75 (8.8%) 2.4% ( -13% - 20%) 0.359 AndHighHigh 65.14 (7.1%) 67.16 (8.3%) 3.1% ( -11% - 19%) 0.206 BrowseDayOfYearSSDVFacets 20.96 (9.7%) 21.65 (11.1%) 3.3% ( -15% - 26%) 0.320 LowPhrase 63.71 (6.9%) 65.86 (9.2%) 3.4% ( -11% - 20%) 0.191 BrowseMonthSSDVFacets 22.49 (13.6%) 23.62 (14.8%) 5.0% ( -20% - 38%) 0.263 BrowseMonthTaxoFacets 26.25 (43.5%) 34.10 (40.2%) 29.9% ( -37% - 200%) 0.024 BrowseDateTaxoFacets 22.04 (40.4%) 29.87 (63.1%) 35.5% ( -48% - 233%) 0.034 BrowseDayOfYearTaxoFacets 22.07 (39.3%) 31.04 (64.1%) 40.6% ( -45% - 236%) 0.016 OrHighMed 59.30 (9.3%) 84.18 (20.1%) 41.9% ( 11% - 78%) 0.000 BrowseRandomLabelTaxoFacets 20.38 (52.4%) 30.77 (88.6%) 50.9% ( -59% - 403%) 0.027 ``` This seems to suggest tasks run may interfere with each other as opposed to independent? Do you have any suggestion where I can look into next to confirm the performance impact of this change ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org