gf2121 opened a new pull request, #14333:
URL: https://github.com/apache/lucene/pull/14333

   **Context**
   
   * #12631 introduced a MSBVLong format to encode the first fp of FST output. 
It is the first time we benefit from the output sharing in blocktree. The 
change reduces ~13% tip size, in turn caused a performance regression when 
accumulating output bytes #12659. Then 
https://github.com/apache/lucene/pull/12722 introduce a complex and tricky 
OutputAccumulator to get the performance back a bit, while still slower than no 
output prefix sharing.
   
   * https://github.com/apache/lucene/pull/12722/files we disabled suffix 
sharing as we find that very few suffix get shared in block tree.
   
   **Proposal**
   
   Before the PRs mentioned above, the fst in block tree is almost like a trie 
- no output prefix sharing and few suffix sharing. This makes me wonder if a 
can simply implement a trie that specialized designed for block tree index. 
   
   This is still a draft, but the number looks promising.
   
   **Storage**
   
   <!--StartFragment--><meta http-equiv="Content-Type" content="text/html; 
charset=utf-8"></meta><byte-sheet-html-origin data-id="1741247783638" 
data-version="4" data-is-embed="false" data-grid-line-hidden="false" 
data-importRangeRawData-spreadSource="https://bytedance.larkoffice.com/sheets/BmOusBZAehKLU8tkiAtcCIyjnfd";
 data-importRangeRawData-range="&#39;Sheet1&#39;!B1:E17">
   Baseline | Candidate | diff | diff pct
   -- | -- | -- | --
   4425601 | 4707557 | 281956 | 6.37%
   4458107 | 4781487 | 323380 | 7.25%
   4791217 | 5167556 | 376339 | 7.85%
   4832497 | 5148499 | 316002 | 6.54%
   4807799 | 5128645 | 320846 | 6.67%
   720343 | 689832 | -30511 | -4.24%
   721438 | 686372 | -35066 | -4.86%
   694205 | 663963 | -30242 | -4.36%
   688145 | 660344 | -27801 | -4.04%
   819804 | 762105 | -57699 | -7.04%
   142276 | 117948 | -24328 | -17.10%
   125578 | 102954 | -22624 | -18.02%
   109982 | 90819 | -19163 | -17.42%
   113266 | 93290 | -19976 | -17.64%
   104672 | 85504 | -19168 | -18.31%
   27554930 | 28886875 | 1331945 | 4.83%
   
   </byte-sheet-html-origin><!--EndFragment-->
   
   **Search**
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                             IntNRQ      353.03      (4.4%)      347.96      
(3.6%)   -1.4% (  -9% -    6%) 0.360
                         TermDTSort      145.72      (6.7%)      143.80      
(8.9%)   -1.3% ( -15% -   15%) 0.669
                          And3Terms       97.28      (3.5%)       96.13      
(4.3%)   -1.2% (  -8% -    6%) 0.442
                        CountPhrase        3.62      (3.1%)        3.60      
(4.0%)   -0.7% (  -7% -    6%) 0.629
                        AndHighHigh       15.87      (3.6%)       15.77      
(2.7%)   -0.6% (  -6% -    5%) 0.614
                     FilteredIntNRQ       31.51      (1.5%)       31.33      
(3.0%)   -0.6% (  -4% -    3%) 0.533
                         AndHighMed       66.62      (2.0%)       66.27      
(2.6%)   -0.5% (  -5% -    4%) 0.564
                         OrHighHigh       14.93      (4.1%)       14.85      
(3.8%)   -0.5% (  -8% -    7%) 0.742
                    CountAndHighMed       75.81      (1.8%)       75.48      
(1.1%)   -0.4% (  -3% -    2%) 0.451
                   CountAndHighHigh       64.20      (2.1%)       63.92      
(1.9%)   -0.4% (  -4% -    3%) 0.585
                          OrHighMed       36.58      (5.2%)       36.42      
(5.0%)   -0.4% ( -10% -   10%) 0.833
                CombinedAndHighHigh        6.11      (1.5%)        6.08      
(1.3%)   -0.4% (  -3% -    2%) 0.459
                   AndMedOrHighHigh        8.44      (2.9%)        8.41      
(3.1%)   -0.4% (  -6% -    5%) 0.752
                       TermGroup10K        4.72      (3.3%)        4.71      
(3.8%)   -0.3% (  -7% -    7%) 0.823
                       AndStopWords        4.89      (3.2%)        4.88      
(2.0%)   -0.2% (  -5% -    5%) 0.834
                       SloppyPhrase        0.71      (2.2%)        0.71      
(2.4%)   -0.2% (  -4% -    4%) 0.816
                       CombinedTerm       13.57      (1.3%)       13.55      
(1.5%)   -0.2% (  -3% -    2%) 0.725
                       TermBGroup1M       11.17      (3.2%)       11.15      
(4.0%)   -0.2% (  -7% -    7%) 0.910
                     TermBGroup1M1P       16.02      (3.1%)       15.99      
(2.4%)   -0.1% (  -5% -    5%) 0.893
        FilteredAnd2Terms2StopWords       25.21      (2.5%)       25.18      
(2.7%)   -0.1% (  -5% -    5%) 0.888
                           SpanNear        3.68      (2.6%)        3.67      
(3.3%)   -0.1% (  -5% -    5%) 0.927
                 FilteredAndHighMed       77.81      (3.9%)       77.88      
(4.2%)    0.1% (  -7% -    8%) 0.960
                CountFilteredIntNRQ       25.37      (1.3%)       25.39      
(1.6%)    0.1% (  -2% -    3%) 0.887
                       TermGroup100       10.35      (3.6%)       10.36      
(3.6%)    0.1% (  -6% -    7%) 0.949
                FilteredAndHighHigh       19.56      (1.6%)       19.59      
(1.5%)    0.1% (  -2% -    3%) 0.832
                FilteredOrStopWords       15.56      (1.9%)       15.59      
(1.8%)    0.1% (  -3% -    3%) 0.842
                 CombinedAndHighMed       43.35      (1.1%)       43.42      
(1.6%)    0.2% (  -2% -    2%) 0.778
                   DismaxOrHighHigh       65.95      (3.3%)       66.08      
(3.0%)    0.2% (  -5% -    6%) 0.877
                    AndHighOrMedMed       28.09      (1.7%)       28.16      
(2.4%)    0.2% (  -3% -    4%) 0.771
                  FilteredOrHighMed       22.87      (1.7%)       22.93      
(2.1%)    0.3% (  -3% -    4%) 0.729
             CountFilteredOrHighMed       23.19      (1.2%)       23.27      
(1.2%)    0.3% (  -1% -    2%) 0.453
                   IntervalsOrdered        4.11      (4.2%)        4.12      
(3.9%)    0.4% (  -7% -    8%) 0.819
                And2Terms2StopWords       71.04      (1.8%)       71.30      
(2.1%)    0.4% (  -3% -    4%) 0.629
                CountFilteredOrMany        5.76      (2.5%)        5.78      
(1.6%)    0.4% (  -3% -    4%) 0.650
                CountFilteredPhrase        7.67      (3.0%)        7.71      
(2.2%)    0.4% (  -4% -    5%) 0.696
                 FilteredOrHighHigh       19.07      (1.6%)       19.19      
(1.8%)    0.6% (  -2% -    4%) 0.378
                    DismaxOrHighMed       54.23      (2.5%)       54.56      
(1.9%)    0.6% (  -3% -    5%) 0.481
                     FilteredOrMany        8.96      (1.8%)        9.03      
(2.4%)    0.7% (  -3% -    5%) 0.390
            CountFilteredOrHighHigh       18.52      (2.2%)       18.66      
(1.4%)    0.8% (  -2% -    4%) 0.278
                             OrMany        5.77      (3.0%)        5.81      
(2.1%)    0.8% (  -4% -    6%) 0.439
                        TermGroup1M        9.17      (3.3%)        9.24      
(3.4%)    0.8% (  -5% -    7%) 0.539
                        CountOrMany        6.99      (4.0%)        7.04      
(3.1%)    0.8% (  -6% -    8%) 0.564
               FilteredAndStopWords       10.58      (1.7%)       10.67      
(1.2%)    0.8% (  -2% -    3%) 0.144
                 CombinedOrHighHigh        6.94      (2.0%)        7.00      
(1.4%)    0.9% (  -2% -    4%) 0.195
                       FilteredTerm       67.90      (2.7%)       68.51      
(1.9%)    0.9% (  -3% -    5%) 0.330
                    CountOrHighHigh       51.84      (4.0%)       52.31      
(2.9%)    0.9% (  -5% -    8%) 0.503
                           Or3Terms       79.18      (4.6%)       79.96      
(4.9%)    1.0% (  -8% -   10%) 0.599
                     CountOrHighMed      111.42      (3.9%)      112.54      
(3.7%)    1.0% (  -6% -    9%) 0.504
                  TermDayOfYearSort      411.94      (3.8%)      416.25      
(5.0%)    1.0% (  -7% -   10%) 0.548
                  CombinedOrHighMed       40.83      (1.2%)       41.27      
(1.3%)    1.1% (  -1% -    3%) 0.033
                  FilteredAnd3Terms      334.76      (2.7%)      338.60      
(3.7%)    1.1% (  -5% -    7%) 0.366
                     FilteredPhrase        9.50      (2.6%)        9.61      
(1.7%)    1.2% (  -3% -    5%) 0.183
                      TermTitleSort      164.18      (4.4%)      166.12      
(5.0%)    1.2% (  -7% -   11%) 0.526
                        OrStopWords       23.77      (6.8%)       24.07      
(4.4%)    1.2% (  -9% -   13%) 0.581
                             Phrase        4.47      (4.7%)        4.53      
(3.1%)    1.3% (  -6% -    9%) 0.395
                   FilteredOr3Terms       44.71      (1.5%)       45.31      
(1.9%)    1.3% (  -2% -    4%) 0.049
                 Or2Terms2StopWords      186.96      (3.6%)      189.45      
(3.1%)    1.3% (  -5% -    8%) 0.310
         FilteredOr2Terms2StopWords      136.80      (2.9%)      139.19      
(2.5%)    1.8% (  -3% -    7%) 0.100
                               Term      504.18      (2.7%)      514.27      
(1.9%)    2.0% (  -2% -    6%) 0.030
                    FilteredPrefix3      117.37      (3.8%)      119.85      
(3.4%)    2.1% (  -4% -    9%) 0.139
                         OrHighRare       53.99      (4.0%)       55.17      
(4.8%)    2.2% (  -6% -   11%) 0.208
                      TermMonthSort     1925.90      (4.4%)     1973.70      
(7.8%)    2.5% (  -9% -   15%) 0.316
                         DismaxTerm      491.73      (3.3%)      505.93      
(4.0%)    2.9% (  -4% -   10%) 0.043
                            Prefix3      133.57      (3.8%)      137.51      
(3.2%)    3.0% (  -3% -   10%) 0.032
                           Wildcard       40.03      (3.7%)       42.46      
(2.5%)    6.1% (   0% -   12%) 0.000
                             Fuzzy1       62.01      (2.7%)       65.91      
(2.4%)    6.3% (   1% -   11%) 0.000
                             Fuzzy2       56.16      (3.1%)       59.85      
(2.6%)    6.6% (   0% -   12%) 0.000
                            Respell       44.52      (1.1%)       48.16      
(1.3%)    8.2% (   5% -   10%) 0.000
                          CountTerm     5092.84      (7.7%)     5699.48      
(9.0%)   11.9% (  -4% -   30%) 0.000
                           PKLookup      181.08      (3.3%)      224.65      
(3.9%)   24.1% (  16% -   32%) 0.000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to