txwei commented on PR #16240: URL: https://github.com/apache/lucene/pull/16240#issuecomment-4678389314
I haven't had the chance to go through the PR, but I like the idea of setting a budget. I benchmarked this PR using a luceneutil [branch](https://github.com/mikemccand/luceneutil/compare/main...txwei:luceneutil:leading-wildcard-query?expand=1) that tests query `WildcardLeadingAndMissing: +body:/.*qmzxwvbb.*/ +body:zzznomatchqqq`, and it looks like this would regress this specific query pattern by ~10x. ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value WildcardLeadingAndMissing 2176.19 (5.2%) 227.14 (0.6%) -89.6% ( -90% - -88%) 0.000 BrowseRandomLabelSSDVFacets 20.19 (6.8%) 19.31 (7.6%) -4.4% ( -17% - 10%) 0.056 Prefix3 578.41 (6.4%) 560.19 (7.1%) -3.2% ( -15% - 11%) 0.140 OrNotHighLow 2565.61 (10.6%) 2493.25 (6.8%) -2.8% ( -18% - 16%) 0.315 Wildcard 42.33 (5.1%) 41.32 (4.6%) -2.4% ( -11% - 7%) 0.116 MedIntervalsOrdered 471.34 (9.2%) 460.28 (9.2%) -2.3% ( -18% - 17%) 0.418 HighTermMonthSort 1958.36 (6.4%) 1915.67 (4.6%) -2.2% ( -12% - 9%) 0.212 BrowseRandomLabelTaxoFacets 43.93 (38.9%) 43.03 (38.4%) -2.0% ( -57% - 123%) 0.867 HighTermTitleSort 176.28 (7.0%) 172.68 (8.9%) -2.0% ( -16% - 14%) 0.419 HighPhrase 354.24 (9.0%) 347.98 (8.7%) -1.8% ( -17% - 17%) 0.527 LowPhrase 427.22 (4.6%) 420.55 (5.0%) -1.6% ( -10% - 8%) 0.305 BrowseDayOfYearSSDVFacets 26.24 (5.8%) 25.87 (10.8%) -1.4% ( -17% - 16%) 0.613 AndHighMed 1398.22 (6.4%) 1381.80 (6.4%) -1.2% ( -13% - 12%) 0.563 HighIntervalsOrdered 40.96 (5.8%) 40.55 (4.3%) -1.0% ( -10% - 9%) 0.530 HighSpanNear 78.67 (4.0%) 77.88 (3.5%) -1.0% ( -8% - 6%) 0.396 AndHighHighDayTaxoFacets 41.10 (2.9%) 40.75 (3.0%) -0.9% ( -6% - 5%) 0.351 AndHighMedDayTaxoFacets 478.24 (5.4%) 475.57 (4.1%) -0.6% ( -9% - 9%) 0.715 OrHighNotMed 1064.49 (9.5%) 1058.78 (11.5%) -0.5% ( -19% - 22%) 0.872 BrowseDateTaxoFacets 38.09 (26.1%) 37.89 (25.2%) -0.5% ( -41% - 68%) 0.948 BrowseDayOfYearTaxoFacets 38.41 (26.0%) 38.27 (25.2%) -0.4% ( -40% - 68%) 0.965 LowTerm 3128.73 (9.0%) 3120.17 (7.6%) -0.3% ( -15% - 17%) 0.917 OrNotHighMed 554.45 (4.8%) 554.48 (6.9%) 0.0% ( -11% - 12%) 0.998 IntNRQ 464.05 (4.3%) 464.07 (3.4%) 0.0% ( -7% - 8%) 0.996 HighTermTitleBDVSort 106.22 (6.5%) 106.25 (4.1%) 0.0% ( -9% - 11%) 0.987 MedTermDayTaxoFacets 148.27 (2.5%) 148.46 (2.7%) 0.1% ( -4% - 5%) 0.880 TermDTSort 489.91 (6.4%) 490.53 (6.1%) 0.1% ( -11% - 13%) 0.949 MedPhrase 176.31 (3.2%) 176.75 (3.6%) 0.3% ( -6% - 7%) 0.816 PKLookup 538.63 (3.3%) 540.20 (3.8%) 0.3% ( -6% - 7%) 0.794 OrHighNotHigh 405.04 (5.9%) 406.59 (7.6%) 0.4% ( -12% - 14%) 0.858 HighSloppyPhrase 105.80 (3.5%) 106.29 (4.3%) 0.5% ( -7% - 8%) 0.715 Fuzzy2 76.25 (4.1%) 76.64 (3.3%) 0.5% ( -6% - 8%) 0.656 LowSloppyPhrase 205.94 (5.2%) 207.09 (5.3%) 0.6% ( -9% - 11%) 0.735 LowSpanNear 657.60 (4.8%) 661.44 (3.6%) 0.6% ( -7% - 9%) 0.664 AndHighLow 2736.69 (6.3%) 2755.18 (6.6%) 0.7% ( -11% - 14%) 0.739 MedSpanNear 174.07 (5.0%) 175.33 (3.6%) 0.7% ( -7% - 9%) 0.602 OrHighMedDayTaxoFacets 33.88 (3.8%) 34.14 (3.2%) 0.8% ( -6% - 8%) 0.484 AndMissingHigh 4860.78 (7.6%) 4901.35 (7.4%) 0.8% ( -13% - 17%) 0.725 Respell 92.41 (2.8%) 93.35 (1.5%) 1.0% ( -3% - 5%) 0.149 BrowseMonthTaxoFacets 37.21 (30.4%) 37.59 (28.0%) 1.0% ( -43% - 85%) 0.912 AndHighHigh 500.91 (8.0%) 506.25 (8.1%) 1.1% ( -13% - 18%) 0.674 HighTermDayOfYearSort 432.94 (8.9%) 437.70 (4.2%) 1.1% ( -11% - 15%) 0.617 Fuzzy1 130.60 (3.6%) 132.06 (2.8%) 1.1% ( -5% - 7%) 0.269 MedTerm 2285.10 (9.4%) 2311.66 (10.8%) 1.2% ( -17% - 23%) 0.717 range 4578.14 (8.1%) 4637.05 (7.0%) 1.3% ( -12% - 17%) 0.591 OrHighHigh 614.12 (10.2%) 625.85 (7.6%) 1.9% ( -14% - 21%) 0.503 BrowseMonthSSDVFacets 27.99 (9.4%) 28.53 (13.8%) 1.9% ( -19% - 27%) 0.607 MedSloppyPhrase 295.99 (8.5%) 302.46 (6.2%) 2.2% ( -11% - 18%) 0.352 OrNotHighHigh 866.57 (6.6%) 886.73 (6.7%) 2.3% ( -10% - 16%) 0.270 OrHighMed 1165.36 (7.5%) 1193.98 (6.0%) 2.5% ( -10% - 17%) 0.254 LowIntervalsOrdered 38.88 (5.7%) 39.89 (4.3%) 2.6% ( -7% - 13%) 0.104 HighTerm 1803.41 (12.6%) 1854.46 (15.5%) 2.8% ( -22% - 35%) 0.526 OrHighLow 1736.76 (6.3%) 1789.56 (7.4%) 3.0% ( -9% - 17%) 0.160 OrHighNotLow 1861.53 (8.5%) 1930.09 (7.4%) 3.7% ( -11% - 21%) 0.144 IntSet 1334.72 (12.0%) 1428.07 (14.3%) 7.0% ( -17% - 37%) 0.094 BrowseDateSSDVFacets 4.55 (17.7%) 4.90 (17.2%) 7.6% ( -23% - 51%) 0.168 ``` Can we consider lowering the visit budget? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
