epotyom opened a new pull request, #13559:
URL: https://github.com/apache/lucene/pull/13559

   In SparseFixedBitSet.firstDoc, instead of iterating though the entire 
indices array until non-zero value is found, keep track of max updated index.
   
   Use case where it improves performance:
   1. `SparseFixedBitSet` is created with high enough length, e.g. max doc in a 
segment
   2. `#nextSetBit` is called (in a loop) on a bit set that is still being 
built, i.e. some of the next bits are `#set`, but the rest of the bit set is 
still empty.
   3. The moment there are no further set bits, `#nextSetBit` call to 
`#firstDoc` iterates through the rest of `indices` array. 
   
   In my case, we use SparseFixedBitSet to track and iterate children hits 
found in `ToParentBlockJoinQuery`. Iterating through empty `indices` elements 
becomes expensive when we do it for each parent docID.
   
   Lucene util performance test results might not be great though - so maybe 
there is better way to achieve similar effect?
   
   ```
   python3 src/python/localrun.py -source wikimediumall
   ...
   
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
               BrowseDateTaxoFacets        1.65      (9.1%)        1.59      
(0.5%)   -3.6% ( -12% -    6%) 0.081
          BrowseDayOfYearTaxoFacets        1.67      (9.2%)        1.61      
(0.6%)   -3.5% ( -12% -    6%) 0.088
               MedTermDayTaxoFacets        9.40      (6.2%)        9.19      
(5.0%)   -2.2% ( -12% -    9%) 0.206
        BrowseRandomLabelTaxoFacets        1.29      (4.6%)        1.27      
(1.0%)   -1.9% (  -7% -    3%) 0.070
                            Prefix3      543.93      (5.7%)      535.20      
(4.6%)   -1.6% ( -11% -    9%) 0.326
                         AndHighLow      780.30      (3.9%)      771.77      
(4.1%)   -1.1% (  -8% -    7%) 0.383
                         AndHighMed      199.79      (2.3%)      197.77      
(3.0%)   -1.0% (  -6% -    4%) 0.233
                    MedSloppyPhrase       61.79      (4.1%)       61.24      
(4.1%)   -0.9% (  -8% -    7%) 0.488
                        AndHighHigh       84.66      (6.6%)       83.92      
(7.6%)   -0.9% ( -14% -   14%) 0.699
                           PKLookup      143.72      (1.9%)      142.48      
(2.1%)   -0.9% (  -4% -    3%) 0.171
                             Fuzzy1       56.85      (1.5%)       56.36      
(2.0%)   -0.8% (  -4% -    2%) 0.122
               BrowseDateSSDVFacets        0.43     (16.7%)        0.43     
(16.1%)   -0.8% ( -28% -   38%) 0.873
                           Wildcard      159.45      (2.7%)      158.30      
(4.0%)   -0.7% (  -7% -    6%) 0.505
                             Fuzzy2       56.79      (1.2%)       56.38      
(1.8%)   -0.7% (  -3% -    2%) 0.139
                         HighPhrase       20.07      (4.4%)       19.94      
(5.8%)   -0.6% ( -10% -    9%) 0.701
                        MedSpanNear       15.66      (1.8%)       15.60      
(2.2%)   -0.4% (  -4% -    3%) 0.537
                       OrNotHighMed      211.86      (3.1%)      211.03      
(2.7%)   -0.4% (  -5% -    5%) 0.670
               HighTermTitleBDVSort       16.31      (2.8%)       16.25      
(2.7%)   -0.4% (  -5% -    5%) 0.661
                          MedPhrase      154.39      (2.7%)      154.01      
(3.3%)   -0.2% (  -6% -    5%) 0.800
                          OrHighMed      184.54      (2.5%)      184.21      
(2.0%)   -0.2% (  -4% -    4%) 0.797
                          LowPhrase       72.18      (3.6%)       72.06      
(4.3%)   -0.2% (  -7% -    8%) 0.893
                      OrHighNotHigh      229.39      (4.4%)      229.05      
(4.5%)   -0.1% (  -8% -    9%) 0.915
                    LowSloppyPhrase       98.92      (1.5%)       98.84      
(2.1%)   -0.1% (  -3% -    3%) 0.897
                        LowSpanNear       53.22      (1.0%)       53.21      
(0.8%)   -0.0% (  -1% -    1%) 0.932
                            Respell       34.18      (1.9%)       34.18      
(2.4%)   -0.0% (  -4% -    4%) 0.986
                       HighSpanNear        5.05      (3.1%)        5.06      
(3.0%)    0.1% (  -5% -    6%) 0.929
            AndHighMedDayTaxoFacets       16.78      (1.6%)       16.79      
(1.7%)    0.1% (  -3% -    3%) 0.850
                          OrHighLow      381.10      (3.4%)      381.52      
(3.0%)    0.1% (  -6% -    6%) 0.914
                   HighSloppyPhrase       12.20      (3.4%)       12.22      
(4.1%)    0.1% (  -7% -    7%) 0.902
                  HighTermMonthSort     1059.29      (4.7%)     1061.27      
(4.8%)    0.2% (  -8% -   10%) 0.901
           AndHighHighDayTaxoFacets       13.53      (1.6%)       13.56      
(1.7%)    0.2% (  -3% -    3%) 0.703
                       OrNotHighLow      664.93      (3.1%)      666.40      
(3.6%)    0.2% (  -6% -    7%) 0.835
                            MedTerm      330.39      (8.8%)      331.13      
(6.3%)    0.2% ( -13% -   16%) 0.927
                       OrHighNotLow      305.23      (5.3%)      306.22      
(5.1%)    0.3% (  -9% -   11%) 0.844
        BrowseRandomLabelSSDVFacets        1.64      (4.1%)        1.65      
(4.5%)    0.4% (  -7% -    9%) 0.754
                      OrNotHighHigh      170.52      (6.8%)      171.39      
(6.2%)    0.5% ( -11% -   14%) 0.804
             OrHighMedDayTaxoFacets        3.03      (4.3%)        3.04      
(3.1%)    0.6% (  -6% -    8%) 0.632
                            LowTerm      367.63      (5.1%)      370.21      
(4.0%)    0.7% (  -7% -   10%) 0.629
                         TermDTSort       40.64      (4.8%)       41.01      
(2.9%)    0.9% (  -6% -    9%) 0.467
                           HighTerm      321.26      (8.6%)      324.29      
(6.8%)    0.9% ( -13% -   17%) 0.702
                         OrHighHigh      103.68      (8.6%)      104.72      
(7.8%)    1.0% ( -14% -   19%) 0.699
                LowIntervalsOrdered       89.24      (4.8%)       90.40      
(5.1%)    1.3% (  -8% -   11%) 0.409
                MedIntervalsOrdered       14.36      (5.8%)       14.59      
(5.7%)    1.6% (  -9% -   13%) 0.386
               HighIntervalsOrdered       20.94      (4.8%)       21.28      
(5.0%)    1.6% (  -7% -   11%) 0.299
                  HighTermTitleSort       65.54      (2.8%)       66.62      
(3.0%)    1.6% (  -4% -    7%) 0.073
                       OrHighNotMed      326.83      (3.5%)      332.80      
(4.0%)    1.8% (  -5% -    9%) 0.127
              HighTermDayOfYearSort      313.09      (3.0%)      318.88      
(4.2%)    1.8% (  -5% -    9%) 0.108
          BrowseDayOfYearSSDVFacets        2.58      (6.4%)        2.63      
(6.3%)    1.9% ( -10% -   15%) 0.337
              BrowseMonthSSDVFacets        2.64      (6.8%)        2.71      
(6.5%)    2.3% ( -10% -   16%) 0.268
              BrowseMonthTaxoFacets        1.81     (10.6%)        1.88     
(10.3%)    3.6% ( -15% -   27%) 0.278
                             IntNRQ      134.47     (20.5%)      139.69     
(16.0%)    3.9% ( -27% -   50%) 0.505
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to