Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1825228925
After some tweaking and tinkering I was finally able to get the bench running the way I wanted in luceneutil! @mikemccand Unfortunately luceneutil out of the box doesn't work for my case... ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value [19/1802] Wildcard 56.20 (2.0%) 7.30 (0.2%) -87.0% ( -87% - -86%) 0.000 Respell 44.00 (1.7%) 14.46 (0.6%) -67.1% ( -68% - -65%) 0.000 Fuzzy1 54.11 (1.2%) 20.86 (0.8%) -61.4% ( -62% - -60%) 0.000 Prefix3 103.61 (0.8%) 41.37 (0.5%) -60.1% ( -60% - -59%) 0.000 Fuzzy2 42.43 (1.2%) 20.26 (0.9%) -52.3% ( -53% - -50%) 0.000 HighTermTitleSort 127.60 (1.7%) 114.65 (1.6%) -10.1% ( -13% - -6%) 0.000 HighTermMonthSort 2549.22 (2.8%) 2435.74 (3.8%) -4.5% ( -10% - 2%) 0.037 AndHighLow 708.99 (2.3%) 678.15 (2.1%) -4.4% ( -8% - 0%) 0.001 LowTerm 369.16 (5.5%) 358.15 (3.1%) -3.0% ( -10% - 5%) 0.287 OrNotHighLow 258.51 (1.8%) 252.11 (2.2%) -2.5% ( -6% - 1%) 0.054 OrHighLow 348.58 (1.1%) 340.11 (2.5%) -2.4% ( -5% - 1%) 0.046 OrNotHighHigh 141.31 (6.7%) 138.54 (3.2%) -2.0% ( -11% - 8%) 0.554 IntNRQ 17.93 (1.8%) 17.62 (3.7%) -1.7% ( -7% - 3%) 0.349 MedSloppyPhrase 26.73 (0.8%) 26.28 (1.3%) -1.7% ( -3% - 0%) 0.017 HighIntervalsOrdered 3.88 (2.6%) 3.82 (2.3%) -1.6% ( -6% - 3%) 0.314 HighTerm 290.77 (7.3%) 286.20 (6.9%) -1.6% ( -14% - 13%) 0.727 OrHighNotHigh 296.51 (6.9%) 291.90 (4.9%) -1.6% ( -12% - 11%) 0.682 LowIntervalsOrdered 15.48 (1.1%) 15.26 (1.3%) -1.4% ( -3% - 0%) 0.057 MedTerm 405.54 (6.3%) 399.77 (5.9%) -1.4% ( -12% - 11%) 0.713 LowSloppyPhrase 44.64 (0.5%) 44.04 (2.1%) -1.3% ( -3% - 1%) 0.171 HighTermDayOfYearSort 177.36 (1.6%) 175.08 (1.8%) -1.3% ( -4% - 2%) 0.234 OrHighMed 79.44 (3.7%) 78.48 (3.0%) -1.2% ( -7% - 5%) 0.572 OrNotHighMed 268.82 (4.3%) 265.74 (3.1%) -1.1% ( -8% - 6%) 0.632 BrowseMonthTaxoFacets 3.89 (0.4%) 3.85 (1.2%) -0.9% ( -2% - 0%) 0.134 AndHighMedDayTaxoFacets 44.33 (0.7%) 43.96 (1.1%) -0.8% ( -2% - 0%) 0.145 MedSpanNear 30.67 (1.1%) 30.42 (2.3%) -0.8% ( -4% - 2%) 0.481 LowSpanNear 4.56 (0.8%) 4.53 (2.1%) -0.6% ( -3% - 2%) 0.576 HighSpanNear 8.52 (1.4%) 8.47 (2.1%) -0.5% ( -3% - 2%) 0.636 OrHighNotLow 236.32 (6.6%) 235.28 (4.4%) -0.4% ( -10% - 11%) 0.901 AndHighHighDayTaxoFacets 4.31 (0.6%) 4.29 (0.8%) -0.4% ( -1% - 1%) 0.446 LowPhrase 69.90 (1.2%) 69.67 (2.7%) -0.3% ( -4% - 3%) 0.807 AndHighMed 41.87 (0.6%) 41.75 (2.7%) -0.3% ( -3% - 3%) 0.828 OrHighHigh 45.86 (7.4%) 45.77 (7.9%) -0.2% ( -14% - 16%) 0.969 TermDTSort 101.90 (2.3%) 101.84 (2.0%) -0.1% ( -4% - 4%) 0.966 HighSloppyPhrase 0.48 (2.2%) 0.48 (2.6%) -0.0% ( -4% - 4%) 0.995 BrowseDateSSDVFacets 1.00 (3.7%) 1.00 (5.3%) -0.0% ( -8% - 9%) 1.000 HighTermTitleBDVSort 4.48 (5.4%) 4.49 (4.4%) 0.0% ( -9% - 10%) 0.990 BrowseDayOfYearSSDVFacets 3.51 (13.4%) 3.52 (13.3%) 0.1% ( -23% - 30%) 0.990 BrowseMonthSSDVFacets 3.38 (0.7%) 3.39 (0.6%) 0.1% ( -1% - 1%) 0.740 BrowseDayOfYearTaxoFacets 3.85 (0.3%) 3.85 (0.5%) 0.2% ( 0% - 0%) 0.469 MedIntervalsOrdered 1.65 (1.5%) 1.65 (1.5%) 0.2% ( -2% - 3%) 0.802 BrowseDateTaxoFacets 3.82 (0.3%) 3.83 (0.4%) 0.3% ( 0% - 0%) 0.264 BrowseRandomLabelSSDVFacets 2.34 (1.0%) 2.35 (0.5%) 0.3% ( -1% - 1%) 0.494 OrHighNotMed 296.34 (7.6%) 297.38 (4.6%) 0.4% ( -11% - 13%) 0.930 BrowseRandomLabelTaxoFacets 3.29 (0.6%) 3.31 (0.7%) 0.5% ( 0% - 1%) 0.275 MedPhrase 9.84 (2.1%) 9.90 (4.1%) 0.6% ( -5% - 6%) 0.787 AndHighHigh 33.50 (1.2%) 33.73 (3.4%) 0.7% ( -3% - 5%) 0.669 HighPhrase 15.15 (2.3%) 15.28 (4.1%) 0.9% ( -5% - 7%) 0.679 MedTermDayTaxoFacets 13.36 (1.8%) 13.55 (2.0%) 1.4% ( -2% - 5%) 0.240 OrHighMedDayTaxoFacets 3.73 (1.4%) 3.83 (2.2%) 2.6% ( 0% - 6%) 0.026 PKLookup 147.84 (1.6%) 157.37 (1.5%) 6.4% ( 3% - 9%) 0.000 ``` ### Observations: #### PKLookup has improvement This is reasonable as the terms index (FST) holds all the terms. #### Fuzzy/Wildcard/Prefix queries got *much slower* This is also expected because currently I used the default implementation provided by `TermsEnum` which does not take advantage of the FST. With an optimized implementation I expect it to at least be on-par and slightly better because the FST holds information about all terms, whereas the current BlockTreeTerms only holds prefixes. #### `HighTermTitleSort` and `HighTermMonthSort` got about 4.5% ~ 10% less throughput I don't quite understand why term lookup could #### `AndHighLow` got slower Am i missing some optimization opportunity for low freq terms? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org