Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

via GitHub Sat, 15 Jun 2024 13:03:11 -0700


original-brownbear commented on PR #13472:
URL: https://github.com/apache/lucene/pull/13472#issuecomment-2170618011


   Lucene util benchmark results for this by running with one less thread for 
this branch vs main (credit to @jpountz and @javanna for the idea) to get an 
idea of the impact:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                             Fuzzy1      105.06      (3.1%)      103.22      
(3.6%)   -1.7% (  -8% -    5%) 0.103
          BrowseDayOfYearTaxoFacets       14.80      (1.0%)       14.55      
(4.5%)   -1.7% (  -7% -    3%) 0.096
             OrHighMedDayTaxoFacets        6.60      (3.3%)        6.49      
(2.1%)   -1.6% (  -6% -    3%) 0.062
                            Respell       52.96      (2.2%)       52.56      
(1.9%)   -0.8% (  -4% -    3%) 0.243
               BrowseDateTaxoFacets       14.91      (1.2%)       14.86      
(3.9%)   -0.4% (  -5% -    4%) 0.695
        BrowseRandomLabelSSDVFacets        3.73      (0.5%)        3.73      
(0.5%)    0.1% (   0% -    1%) 0.714
              BrowseMonthSSDVFacets        5.58      (2.0%)        5.59      
(2.0%)    0.2% (  -3% -    4%) 0.763
          BrowseDayOfYearSSDVFacets        7.61      (0.6%)        7.62      
(0.6%)    0.2% (   0% -    1%) 0.276
               MedTermDayTaxoFacets       25.46      (0.7%)       25.52      
(0.9%)    0.3% (  -1% -    1%) 0.328
           AndHighHighDayTaxoFacets       15.24      (0.7%)       15.28      
(0.5%)    0.3% (  -1% -    1%) 0.183
            AndHighMedDayTaxoFacets       17.92      (0.7%)       17.99      
(0.5%)    0.4% (   0% -    1%) 0.023
        BrowseRandomLabelTaxoFacets       11.95      (1.7%)       12.00      
(1.2%)    0.4% (  -2% -    3%) 0.331
              BrowseMonthTaxoFacets       12.37      (3.0%)       12.46      
(1.7%)    0.7% (  -3% -    5%) 0.358
                  HighTermMonthSort      306.96     (16.4%)      309.25     
(14.6%)    0.7% ( -26% -   38%) 0.879
               BrowseDateSSDVFacets        1.45      (1.0%)        1.48      
(2.4%)    1.7% (  -1% -    5%) 0.004
                            Prefix3      223.49     (31.2%)      228.83     
(13.7%)    2.4% ( -32% -   68%) 0.754
                             Fuzzy2       55.36     (20.9%)       58.92     
(14.4%)    6.4% ( -23% -   52%) 0.256
                           PKLookup      176.48     (18.1%)      194.13     
(13.2%)   10.0% ( -17% -   50%) 0.045
                       OrNotHighLow      472.02      (2.4%)      567.48     
(26.2%)   20.2% (  -8% -   50%) 0.001
                   HighSloppyPhrase        3.06      (3.6%)        3.69      
(7.1%)   20.4% (   9% -   32%) 0.000
                         AndHighLow      784.51     (24.4%)      959.85     
(12.6%)   22.4% ( -11% -   78%) 0.000
                           Wildcard      124.97      (1.4%)      154.50      
(2.5%)   23.6% (  19% -   27%) 0.000
                             IntNRQ       70.70      (1.2%)       87.67      
(4.0%)   24.0% (  18% -   29%) 0.000
                         HighPhrase       94.06      (2.9%)      118.04      
(5.3%)   25.5% (  16% -   34%) 0.000
                        AndHighHigh       53.83      (1.5%)       67.85      
(2.0%)   26.1% (  22% -   30%) 0.000
                    LowSloppyPhrase       60.97      (2.4%)       77.49      
(5.6%)   27.1% (  18% -   35%) 0.000
                          LowPhrase       20.56      (1.2%)       26.27      
(2.9%)   27.7% (  23% -   32%) 0.000
                          MedPhrase       29.76      (1.7%)       39.75      
(5.1%)   33.6% (  26% -   40%) 0.000
                LowIntervalsOrdered       15.55      (2.5%)       20.83      
(4.1%)   33.9% (  26% -   41%) 0.000
                         AndHighMed       99.55      (2.7%)      135.12      
(2.1%)   35.7% (  30% -   41%) 0.000
                        LowSpanNear        3.16      (1.8%)        4.30      
(1.6%)   36.3% (  32% -   40%) 0.000
                          OrHighMed      117.00      (3.8%)      164.78      
(4.2%)   40.8% (  31% -   50%) 0.000
                      OrHighNotHigh       89.87      (6.3%)      128.16     
(36.4%)   42.6% (   0% -   91%) 0.000
                         OrHighHigh       38.70      (1.8%)       55.41      
(8.0%)   43.2% (  32% -   53%) 0.000
                    MedSloppyPhrase        7.29      (3.5%)       10.68      
(4.6%)   46.5% (  37% -   56%) 0.000
                       HighSpanNear        2.54      (2.1%)        3.77      
(3.2%)   48.6% (  42% -   55%) 0.000
                            MedTerm      216.76     (15.6%)      324.89     
(29.6%)   49.9% (   4% -  112%) 0.000
                  HighTermTitleSort       13.92      (9.3%)       23.43      
(8.9%)   68.3% (  45% -   95%) 0.000
                         TermDTSort       68.68      (3.3%)      117.77     
(12.2%)   71.5% (  54% -   90%) 0.000
                           HighTerm      220.46      (5.7%)      396.67     
(14.8%)   79.9% (  56% -  106%) 0.000
                          OrHighLow      218.43     (26.1%)      400.99     
(82.8%)   83.6% ( -20% -  260%) 0.000
               HighTermTitleBDVSort        4.45      (2.1%)        8.32      
(2.1%)   86.8% (  80% -   92%) 0.000
                        MedSpanNear       22.62      (2.7%)       42.88      
(5.8%)   89.6% (  78% -  100%) 0.000
                       OrHighNotLow      329.64     (22.4%)      672.19     
(30.0%)  103.9% (  42% -  201%) 0.000
              HighTermDayOfYearSort       57.50      (3.8%)      125.18      
(9.8%)  117.7% ( 100% -  136%) 0.000
                MedIntervalsOrdered       10.22      (4.1%)       22.48      
(9.4%)  119.9% ( 102% -  139%) 0.000
               HighIntervalsOrdered        2.41      (6.1%)        5.39     
(10.2%)  123.3% ( 100% -  148%) 0.000
                            LowTerm      251.06     (10.8%)      634.45      
(7.9%)  152.7% ( 120% -  192%) 0.000
                       OrNotHighMed       74.81      (5.4%)      221.54     
(14.8%)  196.1% ( 166% -  228%) 0.000
                      OrNotHighHigh       95.65      (7.1%)      314.65     
(21.1%)  228.9% ( 187% -  276%) 0.000
                       OrHighNotMed       59.11      (6.5%)      206.56     
(15.0%)  249.4% ( 214% -  289%) 0.000
   ```
   
   This is wikimediumall, 3 threads for main and 2 threads for this branch. 
Effectively no regressions but some considerable speedups.
   The reason for this is the obvious reduction in context switching. We go 
from perf output for `main`:
   ```
   Performance counter stats for process id '157418':
   
      574,008,686,445      cycles                                               
       
    1,130,739,465,717      instructions              #    1.97  insn per cycle  
       
        2,599,704,747      cache-misses                                         
       
              429,542      context-switches                                     
       
   
         49.053969801 seconds time elapsed
   ```
   
   to this branch 
   
   ```
   Performance counter stats for process id '157292':
   
      526,556,069,563      cycles                                               
       
    1,122,410,787,297      instructions              #    2.13  insn per cycle  
       
        2,420,210,310      cache-misses                                         
       
              385,991      context-switches                                     
       
   
         41.044785986 seconds time elapsed
   ```
   
   -> same number of instructions need to be executed pretty much, but they run 
in fewer cycles and encounter fewer cache misses.
   
   This is also seen in the profile of where the CPU time goes:
   
   main looks like this:
   ```
   17.21%        328981        
org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
   5.75%         109925        
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD()
   5.24%         100195        
org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit()
   5.17%         98733         
org.apache.lucene.util.packed.DirectMonotonicReader#get()
   4.11%         78637         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
   3.98%         76164         
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
   2.57%         49115         
org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$EverythingEnum#nextPosition()
   1.82%         34823         
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
   1.73%         33136         
jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw()
   1.63%         31172         
java.util.concurrent.atomic.AtomicLong#incrementAndGet()
   ```
   
   while this branch looks as follows:
   
   ```
   10.79%        183254        
org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
   5.89%         100099        
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD()
   5.62%         95387         
org.apache.lucene.util.packed.DirectMonotonicReader#get()
   4.59%         77917         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
   4.48%         76145         
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
   3.20%         54407         
org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit()
   2.77%         47088         
org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$EverythingEnum#nextPosition()
   2.06%         34965         
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
   1.91%         32484         
jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw()
   1.81%         30763         
org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$BlockImpactsPostingsEnum#advance()
   1.71%         28966         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
   1.66%         28206         
org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$EverythingEnum#advance()
   ```
   
   -> a lot less time goes into `collect` which goes through contended counter 
increments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Reply via email to