[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()

Bruno Roustant (Jira) Fri, 17 Jan 2020 04:53:23 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017978#comment-17017978
 ]


Bruno Roustant commented on LUCENE-9125:
----------------------------------------

In the benchmark above I used by error wikimedium10k (I edited to mention that).

Here is the benchmark for wikimediumall:

                    Task   QPS trunk      StdDev   QPS patch      StdDev        
        Pct diff

           OrHighNotHigh      769.84      (4.8%)      756.84      (5.0%)   
-1.7% ( -10% -    8%)

            OrNotHighLow      664.03      (4.2%)      653.64      (3.4%)   
-1.6% (  -8% -    6%)

            OrNotHighMed      574.56      (3.0%)      566.90      (2.5%)   
-1.3% (  -6% -    4%)

                 MedTerm     1373.80      (3.9%)     1359.30      (5.1%)   
-1.1% (  -9% -    8%)

             AndHighHigh       19.84      (3.6%)       19.67      (2.9%)   
-0.9% (  -7% -    5%)

              AndHighLow      474.49      (2.9%)      470.36      (3.6%)   
-0.9% (  -7% -    5%)

                  Fuzzy1       69.27     (10.7%)       68.75     (11.0%)   
-0.7% ( -20% -   23%)

           OrNotHighHigh      569.30      (3.4%)      565.26      (5.0%)   
-0.7% (  -8% -    7%)

               MedPhrase       36.97      (2.4%)       36.76      (2.7%)   
-0.6% (  -5% -    4%)

                HighTerm     1133.65      (4.2%)     1128.30      (4.3%)   
-0.5% (  -8% -    8%)

               OrHighLow      227.08      (2.9%)      226.24      (3.3%)   
-0.4% (  -6% -    6%)

              OrHighHigh       24.17      (2.6%)       24.08      (2.4%)   
-0.4% (  -5% -    4%)

                 Prefix3       25.30      (3.8%)       25.22      (3.7%)   
-0.3% (  -7% -    7%)

               OrHighMed       48.26      (3.1%)       48.11      (3.1%)   
-0.3% (  -6% -    6%)

                 LowTerm     1087.75      (3.4%)     1084.44      (3.3%)   
-0.3% (  -6% -    6%)

              AndHighMed       69.62      (3.9%)       69.44      (4.1%)   
-0.3% (  -7% -    7%)

        HighSloppyPhrase       15.11      (2.6%)       15.08      (2.6%)   
-0.2% (  -5% -    5%)

                 Respell       43.34      (2.0%)       43.28      (2.3%)   
-0.1% (  -4% -    4%)

            OrHighNotLow      666.79      (3.4%)      665.98      (4.9%)   
-0.1% (  -8% -    8%)

            HighSpanNear        8.21      (1.8%)        8.20      (2.0%)   
-0.1% (  -3% -    3%)

    HighIntervalsOrdered       14.46      (1.2%)       14.45      (1.4%)   
-0.1% (  -2% -    2%)

              HighPhrase      333.99      (3.3%)      333.74      (3.9%)   
-0.1% (  -7% -    7%)

             MedSpanNear       12.08      (1.8%)       12.07      (2.0%)   
-0.1% (  -3% -    3%)

               LowPhrase      481.10      (2.5%)      481.14      (3.4%)    
0.0% (  -5% -    6%)

         MedSloppyPhrase        6.78      (2.9%)        6.78      (2.9%)    
0.0% (  -5% -    6%)

                PKLookup      157.80      (2.5%)      157.83      (2.5%)    
0.0% (  -4% -    5%)

             LowSpanNear       21.48      (2.1%)       21.48      (2.3%)    
0.0% (  -4% -    4%)

            OrHighNotMed      590.59      (3.9%)      591.21      (3.8%)    
0.1% (  -7% -    8%)

   BrowseMonthTaxoFacets        1.06      (1.1%)        1.06      (0.9%)    
0.1% (  -1% -    2%)

         LowSloppyPhrase       40.57      (2.1%)       40.63      (2.2%)    
0.1% (  -4% -    4%)

                  IntNRQ      124.31      (4.2%)      124.53      (4.9%)    
0.2% (  -8% -    9%)

    BrowseDateTaxoFacets        1.00      (1.0%)        1.00      (0.7%)    
0.2% (  -1% -    1%)

BrowseDayOfYearTaxoFacets        0.99      (0.9%)        1.00      (0.7%)    
0.2% (  -1% -    1%)

   HighTermDayOfYearSort       18.57      (6.2%)       18.62      (6.0%)    
0.3% ( -11% -   13%)

   BrowseMonthSSDVFacets        4.38      (1.0%)        4.40      (0.9%)    
0.4% (  -1% -    2%)

BrowseDayOfYearSSDVFacets        3.92      (0.7%)        3.94      (0.7%)    
0.5% (   0% -    1%)

                Wildcard       52.17      (4.0%)       52.47      (5.0%)    
0.6% (  -8% -    9%)

                  Fuzzy2       57.57      (9.5%)       58.32      (9.3%)    
1.3% ( -16% -   22%)

       HighTermMonthSort       40.51     (14.2%)       41.47     (13.9%)    
2.4% ( -22% -   35%)

 
{quote}There's an option for lucene-util to format the output for JIRA
{quote}
Last time I used this option Jira interpreted some tags and the resulting 
display was not better than this basic one.
{quote}Looking at the results you posted, the optimization seems fairly 
invisible
{quote}
Yes. The change optimizes the construction only of the CompiledAutomaton, so 
this is a tiny part of the fuzzy query execution.
{quote}that's 4.7% of "noise"
{quote}
Yes, there is noise. I tried baseline vs baseline and got the same noise. Maybe 
with wikimediumall this time there is less noise.

> Improve Automaton.step() with binary search and introduce Automaton.next()
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-9125
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9125
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Assignee: Bruno Roustant
>            Priority: Major
>             Fix For: 8.5
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement the existing todo in Automaton.step() (lookup a transition from a 
> source state depending on a given label) to use binary search since the 
> transitions are sorted.
> Introduce new method Automaton.next() to optimize iteration & lookup over all 
> the transitions of a state. This will be used in RunAutomaton constructor and 
> in MinimizationOperations.minimize().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9125) Improve Automaton.step() with binary search and introduce Automaton.next()

Reply via email to