Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

via GitHub Sat, 09 Mar 2024 15:49:11 -0800


stefanvodita commented on PR #12966:
URL: https://github.com/apache/lucene/pull/12966#issuecomment-1987014253


   Thank you all for reviewing! I confirmed that the performance impact was 
from result collection, not from the aggregations themselves, and I've managed 
to claw back the performance hit. Most of the improvement comes from the 
changes to `getTopChildrenForPath`, which no longer usese intermediary 
`Number`s. I've also integrated the performance-related suggestions from 
@epotyom (thank you for those!). I'll address the rest of the comments too, 
just wanted to get this out while it's fresh to see if you all have more 
feedback on the performance front.
   
   `python3 src/python/localrun.py -source wikimediumall`
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
               BrowseDateSSDVFacets        1.24      (6.6%)        1.21      
(9.6%)   -2.5% ( -17% -   14%) 0.334
        BrowseRandomLabelTaxoFacets        3.76      (3.7%)        3.69      
(3.5%)   -1.8% (  -8% -    5%) 0.120
                          MedPhrase       11.46      (2.8%)       11.30      
(2.6%)   -1.3% (  -6% -    4%) 0.112
                  HighTermMonthSort     2290.51      (4.4%)     2262.12      
(4.2%)   -1.2% (  -9% -    7%) 0.360
                       OrHighNotMed      327.20      (3.3%)      323.36      
(3.2%)   -1.2% (  -7% -    5%) 0.252
                       OrHighNotLow      318.99      (3.7%)      315.45      
(4.2%)   -1.1% (  -8% -    7%) 0.377
                          LowPhrase        4.74      (3.1%)        4.69      
(3.0%)   -1.0% (  -6% -    5%) 0.310
                      OrNotHighHigh      244.33      (3.1%)      242.52      
(3.0%)   -0.7% (  -6% -    5%) 0.443
                      OrHighNotHigh      227.54      (2.9%)      225.86      
(3.2%)   -0.7% (  -6% -    5%) 0.438
                       OrNotHighMed      333.78      (2.6%)      331.35      
(2.8%)   -0.7% (  -5% -    4%) 0.391
                         HighPhrase       70.04      (3.2%)       69.53      
(3.3%)   -0.7% (  -6% -    5%) 0.478
                        AndHighHigh       23.27      (7.9%)       23.11      
(7.1%)   -0.7% ( -14% -   15%) 0.777
                           Wildcard       51.02      (4.3%)       50.71      
(4.2%)   -0.6% (  -8% -    8%) 0.652
                        MedSpanNear       29.20      (3.0%)       29.05      
(2.5%)   -0.5% (  -5% -    5%) 0.561
                           HighTerm      475.59      (4.1%)      473.22      
(4.7%)   -0.5% (  -8% -    8%) 0.721
                           PKLookup      176.36      (3.0%)      175.50      
(2.7%)   -0.5% (  -6% -    5%) 0.589
                       HighSpanNear       10.52      (2.7%)       10.47      
(2.2%)   -0.4% (  -5% -    4%) 0.612
                            MedTerm      470.14      (4.4%)      468.33      
(5.4%)   -0.4% (  -9% -    9%) 0.804
          BrowseDayOfYearSSDVFacets        4.08      (3.9%)        4.06      
(4.2%)   -0.4% (  -8% -    8%) 0.775
                       OrNotHighLow      322.80      (2.9%)      321.71      
(2.4%)   -0.3% (  -5% -    5%) 0.692
               HighIntervalsOrdered        3.60      (4.8%)        3.59      
(4.8%)   -0.3% (  -9% -    9%) 0.868
                         AndHighMed       83.14      (3.5%)       82.93      
(3.9%)   -0.2% (  -7% -    7%) 0.833
          BrowseDayOfYearTaxoFacets        4.69      (4.5%)        4.68      
(4.4%)   -0.2% (  -8% -    9%) 0.902
               BrowseDateTaxoFacets        4.61      (4.5%)        4.60      
(4.3%)   -0.1% (  -8% -    9%) 0.937
                            Respell       53.50      (2.2%)       53.46      
(1.8%)   -0.1% (  -3% -    4%) 0.902
            AndHighMedDayTaxoFacets       43.57      (1.5%)       43.54      
(1.6%)   -0.1% (  -3% -    3%) 0.891
                             Fuzzy1       66.17      (2.4%)       66.20      
(2.0%)    0.0% (  -4% -    4%) 0.951
                         AndHighLow      525.57      (2.6%)      525.90      
(4.2%)    0.1% (  -6% -    7%) 0.955
                          OrHighMed       76.00      (3.2%)       76.05      
(3.9%)    0.1% (  -6% -    7%) 0.953
               HighTermTitleBDVSort        6.93      (7.3%)        6.94      
(6.8%)    0.2% ( -13% -   15%) 0.943
                MedIntervalsOrdered        2.77      (3.6%)        2.78      
(3.2%)    0.2% (  -6% -    7%) 0.883
                             Fuzzy2       43.83      (1.9%)       43.90      
(1.7%)    0.2% (  -3% -    3%) 0.770
                        LowSpanNear        6.13      (2.1%)        6.14      
(1.9%)    0.2% (  -3% -    4%) 0.785
                   HighSloppyPhrase        5.52      (3.4%)        5.53      
(3.7%)    0.2% (  -6% -    7%) 0.851
              BrowseMonthSSDVFacets        4.34      (5.1%)        4.35      
(4.7%)    0.2% (  -9% -   10%) 0.891
                            Prefix3       68.56      (4.6%)       68.70      
(6.0%)    0.2% (  -9% -   11%) 0.899
                LowIntervalsOrdered       18.33      (2.8%)       18.38      
(2.5%)    0.3% (  -4% -    5%) 0.737
                    LowSloppyPhrase       20.67      (2.2%)       20.73      
(1.9%)    0.3% (  -3% -    4%) 0.627
           AndHighHighDayTaxoFacets        7.57      (2.3%)        7.59      
(2.5%)    0.3% (  -4% -    5%) 0.669
              HighTermDayOfYearSort      206.91      (2.9%)      207.68      
(2.6%)    0.4% (  -5% -    6%) 0.670
                  HighTermTitleSort      140.79      (1.6%)      141.32      
(2.0%)    0.4% (  -3% -    3%) 0.508
                            LowTerm      438.67      (7.1%)      441.44      
(7.9%)    0.6% ( -13% -   16%) 0.790
                    MedSloppyPhrase       21.78      (3.1%)       21.95      
(3.4%)    0.8% (  -5% -    7%) 0.454
               MedTermDayTaxoFacets       21.51      (2.2%)       21.71      
(1.6%)    0.9% (  -2% -    4%) 0.122
                         TermDTSort      118.13      (3.0%)      119.30      
(3.4%)    1.0% (  -5% -    7%) 0.329
              BrowseMonthTaxoFacets        9.58      (8.6%)        9.68      
(8.8%)    1.1% ( -14% -   20%) 0.691
        BrowseRandomLabelSSDVFacets        2.88      (2.3%)        2.91      
(1.8%)    1.1% (  -2% -    5%) 0.093
                         OrHighHigh       33.81      (7.6%)       34.24      
(8.4%)    1.3% ( -13% -   18%) 0.618
                          OrHighLow      319.44      (6.2%)      323.88      
(3.9%)    1.4% (  -8% -   12%) 0.393
                             IntNRQ       27.52      (5.2%)       27.96      
(5.9%)    1.6% (  -8% -   13%) 0.360
             OrHighMedDayTaxoFacets        2.83      (3.3%)        2.88      
(5.2%)    1.6% (  -6% -   10%) 0.243
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

Reply via email to