[ 
https://issues.apache.org/jira/browse/LUCENE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419349#comment-17419349
 ] 

Greg Miller commented on LUCENE-10062:
--------------------------------------

I re-ranĀ {{luceneutil}} benchmarks {{wikimedium10m}} since [~mikemccand] added 
new faceting tasks (thanks Mike!). Looks like there's a nice improvement on 
these new faceting tasks as well with this change (and no regressions anywhere 
else that I see).

I was waiting to iterate on my PR until I was able to run these new 
benchmarking tasks, but it seems like there's enough benefit to this change to 
pick it back up.


{noformat}
                            TaskQPS baseline      StdDevQPS candidate      
StdDev                Pct diff p-value
           HighTermDayOfYearSort       70.02     (13.7%)       68.45      
(9.7%)   -2.2% ( -22% -   24%) 0.551
                         MedTerm     1300.90      (5.5%)     1275.97      
(6.7%)   -1.9% ( -13% -   10%) 0.324
                        HighTerm     1953.46      (5.8%)     1925.79      
(7.9%)   -1.4% ( -14% -   13%) 0.518
            HighTermTitleBDVSort      122.35     (15.6%)      120.86     
(14.9%)   -1.2% ( -27% -   34%) 0.801
                      TermDTSort      133.47      (8.7%)      131.86      
(7.4%)   -1.2% ( -15% -   16%) 0.637
                         LowTerm     1636.13      (5.5%)     1622.34      
(7.4%)   -0.8% ( -12% -   12%) 0.682
                         Prefix3       25.69      (6.0%)       25.48      
(6.3%)   -0.8% ( -12% -   12%) 0.676
                     LowSpanNear      118.02      (2.1%)      117.31      
(1.8%)   -0.6% (  -4% -    3%) 0.326
               HighTermMonthSort      140.17      (9.8%)      139.47      
(9.9%)   -0.5% ( -18% -   21%) 0.872
                     AndHighHigh       49.17      (3.1%)       48.92      
(2.7%)   -0.5% (  -6% -    5%) 0.584
                    HighSpanNear       25.54      (2.7%)       25.41      
(2.2%)   -0.5% (  -5% -    4%) 0.529
                      AndHighLow      556.68      (5.8%)      554.80      
(5.4%)   -0.3% ( -10% -   11%) 0.848
       BrowseDayOfYearSSDVFacets       16.53      (2.5%)       16.47      
(2.4%)   -0.3% (  -5% -    4%) 0.674
                          IntNRQ       87.76      (2.0%)       87.49      
(2.1%)   -0.3% (  -4% -    3%) 0.634
                     MedSpanNear       31.11      (2.2%)       31.04      
(1.6%)   -0.2% (  -3% -    3%) 0.714
                    OrNotHighLow      765.10      (4.5%)      763.60      
(5.4%)   -0.2% (  -9% -   10%) 0.901
                       MedPhrase      160.05      (3.1%)      159.83      
(2.9%)   -0.1% (  -5% -    6%) 0.885
                HighSloppyPhrase       27.67      (3.1%)       27.64      
(3.0%)   -0.1% (  -6% -    6%) 0.915
                       LowPhrase       61.12      (3.2%)       61.05      
(3.2%)   -0.1% (  -6% -    6%) 0.921
                       OrHighMed       71.85      (2.9%)       71.82      
(2.1%)   -0.0% (  -4% -    5%) 0.963
                      HighPhrase       29.40      (2.3%)       29.39      
(2.8%)   -0.0% (  -5% -    5%) 0.971
                          Fuzzy2       32.58      (4.3%)       32.57      
(6.1%)   -0.0% (  -9% -   10%) 0.992
             LowIntervalsOrdered      150.30      (1.9%)      150.28      
(1.9%)   -0.0% (  -3% -    3%) 0.986
                      AndHighMed      151.32      (3.9%)      151.31      
(4.1%)   -0.0% (  -7% -    8%) 0.993
                      OrHighHigh       23.90      (2.3%)       23.91      
(1.9%)    0.0% (  -4% -    4%) 0.970
                    OrHighNotLow      579.17      (5.1%)      579.35      
(6.4%)    0.0% ( -10% -   12%) 0.986
             MedIntervalsOrdered       86.93      (1.7%)       86.98      
(1.9%)    0.1% (  -3% -    3%) 0.913
                   OrHighNotHigh      536.17      (5.6%)      536.57      
(6.6%)    0.1% ( -11% -   12%) 0.969
                   OrNotHighHigh      787.07      (6.5%)      787.96      
(8.1%)    0.1% ( -13% -   15%) 0.961
                    OrNotHighMed      687.97      (4.7%)      688.77      
(6.9%)    0.1% ( -10% -   12%) 0.950
                 MedSloppyPhrase       68.62      (2.8%)       68.74      
(2.7%)    0.2% (  -5% -    5%) 0.838
                 LowSloppyPhrase      130.37      (2.6%)      130.62      
(2.2%)    0.2% (  -4% -    5%) 0.797
                       OrHighLow      440.44      (4.1%)      441.33      
(4.1%)    0.2% (  -7% -    8%) 0.877
                        Wildcard      122.01      (5.2%)      122.35      
(5.3%)    0.3% (  -9% -   11%) 0.867
            HighIntervalsOrdered       14.24      (2.2%)       14.34      
(2.1%)    0.6% (  -3% -    5%) 0.350
                         Respell       52.04      (2.2%)       52.48      
(2.0%)    0.8% (  -3% -    5%) 0.209
                    OrHighNotMed      674.76      (4.8%)      680.97      
(8.0%)    0.9% ( -11% -   14%) 0.659
                        PKLookup      153.45      (4.3%)      155.13      
(3.8%)    1.1% (  -6% -    9%) 0.394
                          Fuzzy1       56.57      (9.1%)       57.76      
(6.7%)    2.1% ( -12% -   19%) 0.406
           BrowseMonthSSDVFacets       19.59     (10.4%)       20.03      
(6.7%)    2.3% ( -13% -   21%) 0.413
        AndHighHighDayTaxoFacets       19.22      (1.6%)       22.13      
(2.2%)   15.1% (  11% -   19%) 0.000
         AndHighMedDayTaxoFacets       25.62      (1.5%)       29.93      
(2.2%)   16.8% (  12% -   20%) 0.000
            MedTermDayTaxoFacets       12.96      (2.2%)       18.99      
(3.4%)   46.5% (  39% -   53%) 0.000
          OrHighMedDayTaxoFacets        3.97      (2.0%)        5.81      
(4.3%)   46.5% (  39% -   53%) 0.000
           BrowseMonthTaxoFacets        2.59     (10.9%)       11.16     
(35.8%)  330.4% ( 255% -  423%) 0.000
            BrowseDateTaxoFacets        2.44      (9.7%)       13.12     
(51.8%)  438.1% ( 343% -  553%) 0.000
       BrowseDayOfYearTaxoFacets        2.44      (9.7%)       13.13     
(51.7%)  438.2% ( 343% -  552%) 0.000
{noformat}


> Explore using SORTED_NUMERIC doc values to encode taxonomy ordinals for 
> faceting
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-10062
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10062
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Greg Miller
>            Assignee: Greg Miller
>            Priority: Minor
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We currently encode taxonomy ordinals using varint style packing in a binary 
> doc values field. I suspect there have been a number of improvements to 
> SortedNumericDocValues since taxonomy faceting was first introduced, and I 
> plan to explore replacing the custom binary format we have today with a 
> SORTED_NUMERIC type dv field instead.
> I'll report benchmark results and index size impact here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to