[jira] [Commented] (LUCENE-10438) Leverage Weight#count in lucene/facets

Greg Miller (Jira) Thu, 24 Feb 2022 10:28:07 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497616#comment-17497616
 ]


Greg Miller commented on LUCENE-10438:
--------------------------------------

I experimented with this a bit for taxo- and ssdv-faceting but didn't get 
particularly far. I quickly discovered that {{luceneutil}} doesn't seem to 
exercise the {{Facets#getSpecificValue}} code path, which is where I think the 
optimization opportunity might be. To do this though, I had to defer counting 
to an "on demand" approach instead of counting during initialization. The good 
news is that this change doesn't seem to have regressed the existing benchmark 
tasks (see below).

I think the next steps here are to augment {{luceneutil}} to exercise 
{{getSpecificValue}} so we can measure impact. I'll see if I can find some time 
to poke into that, but if anyone else is interested in getting involved, feel 
free to jump in!

 
{code:java}
                            TaskQPS baseline      StdDevQPS candidate      
StdDev                Pct diff p-value
           BrowseMonthSSDVFacets       16.42     (27.7%)       15.16     
(24.5%)   -7.7% ( -46% -   61%) 0.354
          OrHighMedDayTaxoFacets        6.38      (6.7%)        6.28      
(6.4%)   -1.5% ( -13% -   12%) 0.463
                      TermDTSort       93.64     (12.5%)       92.45     
(11.9%)   -1.3% ( -22% -   26%) 0.742
            HighTermTitleBDVSort      142.12     (14.2%)      140.36     
(13.0%)   -1.2% ( -24% -   30%) 0.773
            MedTermDayTaxoFacets       38.39      (4.2%)       37.92      
(4.1%)   -1.2% (  -9% -    7%) 0.356
                      OrHighHigh       42.40      (4.6%)       42.04      
(3.5%)   -0.9% (  -8% -    7%) 0.510
               HighTermMonthSort      104.42     (18.0%)      103.57     
(17.0%)   -0.8% ( -30% -   41%) 0.882
                         Prefix3      270.23      (7.9%)      268.54     
(11.0%)   -0.6% ( -18% -   19%) 0.837
                       OrHighMed       79.38      (4.5%)       79.00      
(3.6%)   -0.5% (  -8% -    7%) 0.709
                    HighSpanNear       18.50      (2.4%)       18.43      
(2.4%)   -0.4% (  -5% -    4%) 0.586
                          IntNRQ      135.21      (0.5%)      134.77      
(1.6%)   -0.3% (  -2% -    1%) 0.371
                    OrNotHighLow     1056.43      (2.7%)     1055.39      
(3.2%)   -0.1% (  -5% -    5%) 0.916
                        PKLookup      169.34      (3.5%)      169.19      
(3.6%)   -0.1% (  -6% -    7%) 0.937
         AndHighMedDayTaxoFacets       34.87      (1.8%)       34.85      
(1.9%)   -0.0% (  -3% -    3%) 0.939
                    OrNotHighMed      930.52      (3.9%)      930.70      
(4.0%)    0.0% (  -7% -    8%) 0.988
                        Wildcard       93.02      (4.9%)       93.05      
(6.7%)    0.0% ( -10% -   12%) 0.984
                         LowTerm     1992.53      (5.1%)     1993.41      
(4.3%)    0.0% (  -8% -    9%) 0.976
                     AndHighHigh       52.14      (4.9%)       52.17      
(4.0%)    0.1% (  -8% -    9%) 0.969
                HighSloppyPhrase       27.70      (4.0%)       27.72      
(3.8%)    0.1% (  -7% -    8%) 0.933
           HighTermDayOfYearSort       82.23     (13.3%)       82.35     
(14.7%)    0.2% ( -24% -   32%) 0.973
                   OrNotHighHigh      923.35      (3.6%)      925.08      
(4.8%)    0.2% (  -7% -    8%) 0.889
        AndHighHighDayTaxoFacets       19.09      (2.3%)       19.16      
(1.9%)    0.3% (  -3% -    4%) 0.622
                 LowSloppyPhrase       28.20      (2.4%)       28.31      
(2.6%)    0.4% (  -4% -    5%) 0.624
                     LowSpanNear       11.96      (3.9%)       12.01      
(2.5%)    0.4% (  -5% -    7%) 0.666
                       LowPhrase      241.84      (4.3%)      242.98      
(4.0%)    0.5% (  -7% -    9%) 0.721
                     MedSpanNear       22.00      (3.3%)       22.11      
(2.0%)    0.5% (  -4% -    6%) 0.568
       BrowseDayOfYearSSDVFacets       12.00     (15.6%)       12.06     
(14.4%)    0.5% ( -25% -   36%) 0.909
                       MedPhrase       20.64      (4.9%)       20.75      
(4.4%)    0.6% (  -8% -   10%) 0.709
                          Fuzzy2       60.95      (1.7%)       61.29      
(1.8%)    0.6% (  -2% -    4%) 0.304
                      HighPhrase       19.65      (4.8%)       19.77      
(4.3%)    0.6% (  -8% -   10%) 0.678
                 MedSloppyPhrase       30.43      (2.3%)       30.63      
(2.3%)    0.7% (  -3% -    5%) 0.354
                          Fuzzy1       67.61      (1.6%)       68.07      
(2.0%)    0.7% (  -2% -    4%) 0.246
                    OrHighNotMed     1150.70      (3.7%)     1159.51      
(3.7%)    0.8% (  -6% -    8%) 0.516
                       OrHighLow      745.90      (2.9%)      751.76      
(1.7%)    0.8% (  -3% -    5%) 0.292
                   OrHighNotHigh      898.58      (4.1%)      906.01      
(4.7%)    0.8% (  -7% -    9%) 0.551
                    OrHighNotLow     1349.46      (3.4%)     1361.12      
(4.0%)    0.9% (  -6% -    8%) 0.463
                         Respell       46.64      (1.9%)       47.06      
(2.0%)    0.9% (  -2% -    4%) 0.152
                      AndHighMed      164.73      (5.7%)      166.39      
(4.4%)    1.0% (  -8% -   11%) 0.531
            BrowseDateSSDVFacets        2.40      (7.2%)        2.43      
(7.7%)    1.1% ( -12% -   17%) 0.643
     BrowseRandomLabelSSDVFacets        9.15      (2.4%)        9.25      
(2.6%)    1.1% (  -3% -    6%) 0.150
                      AndHighLow      873.95      (4.0%)      885.61      
(2.2%)    1.3% (  -4% -    7%) 0.192
           BrowseMonthTaxoFacets       28.32     (24.4%)       28.70     
(24.5%)    1.4% ( -38% -   66%) 0.860
            HighIntervalsOrdered        7.13      (4.9%)        7.24      
(3.1%)    1.6% (  -6% -   10%) 0.219
             LowIntervalsOrdered      116.22      (4.2%)      118.09      
(2.7%)    1.6% (  -5% -    8%) 0.148
                        HighTerm     2742.56      (6.1%)     2794.10      
(4.1%)    1.9% (  -7% -   12%) 0.251
             MedIntervalsOrdered       55.87      (5.3%)       56.93      
(3.4%)    1.9% (  -6% -   11%) 0.173
                         MedTerm     1679.85      (6.3%)     1718.62      
(5.2%)    2.3% (  -8% -   14%) 0.206
     BrowseRandomLabelTaxoFacets       18.04     (17.3%)       18.59     
(17.9%)    3.0% ( -27% -   46%) 0.585
            BrowseDateTaxoFacets       21.38     (21.2%)       22.94     
(22.7%)    7.3% ( -30% -   65%) 0.295
       BrowseDayOfYearTaxoFacets       21.40     (21.5%)       23.01     
(23.2%)    7.5% ( -30% -   66%) 0.286
{code}
 

> Leverage Weight#count in lucene/facets
> --------------------------------------
>
>                 Key: LUCENE-10438
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10438
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/facet
>            Reporter: Adrien Grand
>            Assignee: Greg Miller
>            Priority: Minor
>
> The facet module could leverage Weight#count in order to give fast counts for 
> the browsing use-case?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10438) Leverage Weight#count in lucene/facets

Reply via email to