gf2121 opened a new pull request, #14333: URL: https://github.com/apache/lucene/pull/14333
**Context** * #12631 introduced a MSBVLong format to encode the first fp of FST output. It is the first time we benefit from the output sharing in blocktree. The change reduces ~13% tip size, in turn caused a performance regression when accumulating output bytes #12659. Then https://github.com/apache/lucene/pull/12722 introduce a complex and tricky OutputAccumulator to get the performance back a bit, while still slower than no output prefix sharing. * https://github.com/apache/lucene/pull/12722/files we disabled suffix sharing as we find that very few suffix get shared in block tree. **Proposal** Before the PRs mentioned above, the fst in block tree is almost like a trie - no output prefix sharing and few suffix sharing. This makes me wonder if a can simply implement a trie that specialized designed for block tree index. This is still a draft, but the number looks promising. **Storage** <!--StartFragment--><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta><byte-sheet-html-origin data-id="1741247783638" data-version="4" data-is-embed="false" data-grid-line-hidden="false" data-importRangeRawData-spreadSource="https://bytedance.larkoffice.com/sheets/BmOusBZAehKLU8tkiAtcCIyjnfd" data-importRangeRawData-range="'Sheet1'!B1:E17"> Baseline | Candidate | diff | diff pct -- | -- | -- | -- 4425601 | 4707557 | 281956 | 6.37% 4458107 | 4781487 | 323380 | 7.25% 4791217 | 5167556 | 376339 | 7.85% 4832497 | 5148499 | 316002 | 6.54% 4807799 | 5128645 | 320846 | 6.67% 720343 | 689832 | -30511 | -4.24% 721438 | 686372 | -35066 | -4.86% 694205 | 663963 | -30242 | -4.36% 688145 | 660344 | -27801 | -4.04% 819804 | 762105 | -57699 | -7.04% 142276 | 117948 | -24328 | -17.10% 125578 | 102954 | -22624 | -18.02% 109982 | 90819 | -19163 | -17.42% 113266 | 93290 | -19976 | -17.64% 104672 | 85504 | -19168 | -18.31% 27554930 | 28886875 | 1331945 | 4.83% </byte-sheet-html-origin><!--EndFragment--> **Search** ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value IntNRQ 353.03 (4.4%) 347.96 (3.6%) -1.4% ( -9% - 6%) 0.360 TermDTSort 145.72 (6.7%) 143.80 (8.9%) -1.3% ( -15% - 15%) 0.669 And3Terms 97.28 (3.5%) 96.13 (4.3%) -1.2% ( -8% - 6%) 0.442 CountPhrase 3.62 (3.1%) 3.60 (4.0%) -0.7% ( -7% - 6%) 0.629 AndHighHigh 15.87 (3.6%) 15.77 (2.7%) -0.6% ( -6% - 5%) 0.614 FilteredIntNRQ 31.51 (1.5%) 31.33 (3.0%) -0.6% ( -4% - 3%) 0.533 AndHighMed 66.62 (2.0%) 66.27 (2.6%) -0.5% ( -5% - 4%) 0.564 OrHighHigh 14.93 (4.1%) 14.85 (3.8%) -0.5% ( -8% - 7%) 0.742 CountAndHighMed 75.81 (1.8%) 75.48 (1.1%) -0.4% ( -3% - 2%) 0.451 CountAndHighHigh 64.20 (2.1%) 63.92 (1.9%) -0.4% ( -4% - 3%) 0.585 OrHighMed 36.58 (5.2%) 36.42 (5.0%) -0.4% ( -10% - 10%) 0.833 CombinedAndHighHigh 6.11 (1.5%) 6.08 (1.3%) -0.4% ( -3% - 2%) 0.459 AndMedOrHighHigh 8.44 (2.9%) 8.41 (3.1%) -0.4% ( -6% - 5%) 0.752 TermGroup10K 4.72 (3.3%) 4.71 (3.8%) -0.3% ( -7% - 7%) 0.823 AndStopWords 4.89 (3.2%) 4.88 (2.0%) -0.2% ( -5% - 5%) 0.834 SloppyPhrase 0.71 (2.2%) 0.71 (2.4%) -0.2% ( -4% - 4%) 0.816 CombinedTerm 13.57 (1.3%) 13.55 (1.5%) -0.2% ( -3% - 2%) 0.725 TermBGroup1M 11.17 (3.2%) 11.15 (4.0%) -0.2% ( -7% - 7%) 0.910 TermBGroup1M1P 16.02 (3.1%) 15.99 (2.4%) -0.1% ( -5% - 5%) 0.893 FilteredAnd2Terms2StopWords 25.21 (2.5%) 25.18 (2.7%) -0.1% ( -5% - 5%) 0.888 SpanNear 3.68 (2.6%) 3.67 (3.3%) -0.1% ( -5% - 5%) 0.927 FilteredAndHighMed 77.81 (3.9%) 77.88 (4.2%) 0.1% ( -7% - 8%) 0.960 CountFilteredIntNRQ 25.37 (1.3%) 25.39 (1.6%) 0.1% ( -2% - 3%) 0.887 TermGroup100 10.35 (3.6%) 10.36 (3.6%) 0.1% ( -6% - 7%) 0.949 FilteredAndHighHigh 19.56 (1.6%) 19.59 (1.5%) 0.1% ( -2% - 3%) 0.832 FilteredOrStopWords 15.56 (1.9%) 15.59 (1.8%) 0.1% ( -3% - 3%) 0.842 CombinedAndHighMed 43.35 (1.1%) 43.42 (1.6%) 0.2% ( -2% - 2%) 0.778 DismaxOrHighHigh 65.95 (3.3%) 66.08 (3.0%) 0.2% ( -5% - 6%) 0.877 AndHighOrMedMed 28.09 (1.7%) 28.16 (2.4%) 0.2% ( -3% - 4%) 0.771 FilteredOrHighMed 22.87 (1.7%) 22.93 (2.1%) 0.3% ( -3% - 4%) 0.729 CountFilteredOrHighMed 23.19 (1.2%) 23.27 (1.2%) 0.3% ( -1% - 2%) 0.453 IntervalsOrdered 4.11 (4.2%) 4.12 (3.9%) 0.4% ( -7% - 8%) 0.819 And2Terms2StopWords 71.04 (1.8%) 71.30 (2.1%) 0.4% ( -3% - 4%) 0.629 CountFilteredOrMany 5.76 (2.5%) 5.78 (1.6%) 0.4% ( -3% - 4%) 0.650 CountFilteredPhrase 7.67 (3.0%) 7.71 (2.2%) 0.4% ( -4% - 5%) 0.696 FilteredOrHighHigh 19.07 (1.6%) 19.19 (1.8%) 0.6% ( -2% - 4%) 0.378 DismaxOrHighMed 54.23 (2.5%) 54.56 (1.9%) 0.6% ( -3% - 5%) 0.481 FilteredOrMany 8.96 (1.8%) 9.03 (2.4%) 0.7% ( -3% - 5%) 0.390 CountFilteredOrHighHigh 18.52 (2.2%) 18.66 (1.4%) 0.8% ( -2% - 4%) 0.278 OrMany 5.77 (3.0%) 5.81 (2.1%) 0.8% ( -4% - 6%) 0.439 TermGroup1M 9.17 (3.3%) 9.24 (3.4%) 0.8% ( -5% - 7%) 0.539 CountOrMany 6.99 (4.0%) 7.04 (3.1%) 0.8% ( -6% - 8%) 0.564 FilteredAndStopWords 10.58 (1.7%) 10.67 (1.2%) 0.8% ( -2% - 3%) 0.144 CombinedOrHighHigh 6.94 (2.0%) 7.00 (1.4%) 0.9% ( -2% - 4%) 0.195 FilteredTerm 67.90 (2.7%) 68.51 (1.9%) 0.9% ( -3% - 5%) 0.330 CountOrHighHigh 51.84 (4.0%) 52.31 (2.9%) 0.9% ( -5% - 8%) 0.503 Or3Terms 79.18 (4.6%) 79.96 (4.9%) 1.0% ( -8% - 10%) 0.599 CountOrHighMed 111.42 (3.9%) 112.54 (3.7%) 1.0% ( -6% - 9%) 0.504 TermDayOfYearSort 411.94 (3.8%) 416.25 (5.0%) 1.0% ( -7% - 10%) 0.548 CombinedOrHighMed 40.83 (1.2%) 41.27 (1.3%) 1.1% ( -1% - 3%) 0.033 FilteredAnd3Terms 334.76 (2.7%) 338.60 (3.7%) 1.1% ( -5% - 7%) 0.366 FilteredPhrase 9.50 (2.6%) 9.61 (1.7%) 1.2% ( -3% - 5%) 0.183 TermTitleSort 164.18 (4.4%) 166.12 (5.0%) 1.2% ( -7% - 11%) 0.526 OrStopWords 23.77 (6.8%) 24.07 (4.4%) 1.2% ( -9% - 13%) 0.581 Phrase 4.47 (4.7%) 4.53 (3.1%) 1.3% ( -6% - 9%) 0.395 FilteredOr3Terms 44.71 (1.5%) 45.31 (1.9%) 1.3% ( -2% - 4%) 0.049 Or2Terms2StopWords 186.96 (3.6%) 189.45 (3.1%) 1.3% ( -5% - 8%) 0.310 FilteredOr2Terms2StopWords 136.80 (2.9%) 139.19 (2.5%) 1.8% ( -3% - 7%) 0.100 Term 504.18 (2.7%) 514.27 (1.9%) 2.0% ( -2% - 6%) 0.030 FilteredPrefix3 117.37 (3.8%) 119.85 (3.4%) 2.1% ( -4% - 9%) 0.139 OrHighRare 53.99 (4.0%) 55.17 (4.8%) 2.2% ( -6% - 11%) 0.208 TermMonthSort 1925.90 (4.4%) 1973.70 (7.8%) 2.5% ( -9% - 15%) 0.316 DismaxTerm 491.73 (3.3%) 505.93 (4.0%) 2.9% ( -4% - 10%) 0.043 Prefix3 133.57 (3.8%) 137.51 (3.2%) 3.0% ( -3% - 10%) 0.032 Wildcard 40.03 (3.7%) 42.46 (2.5%) 6.1% ( 0% - 12%) 0.000 Fuzzy1 62.01 (2.7%) 65.91 (2.4%) 6.3% ( 1% - 11%) 0.000 Fuzzy2 56.16 (3.1%) 59.85 (2.6%) 6.6% ( 0% - 12%) 0.000 Respell 44.52 (1.1%) 48.16 (1.3%) 8.2% ( 5% - 10%) 0.000 CountTerm 5092.84 (7.7%) 5699.48 (9.0%) 11.9% ( -4% - 30%) 0.000 PKLookup 181.08 (3.3%) 224.65 (3.9%) 24.1% ( 16% - 32%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org