jpountz commented on issue #12665:
URL: https://github.com/apache/lucene/issues/12665#issuecomment-1770827026

   I did a first indexing run on wikibigall with the following merge policy, 
which I tried to make as lightweight as possible:
   
   ```
       BPIndexReorderer reorderer = new BPIndexReorderer();
       reorderer.setMinDocFreq(16384);
       reorderer.setMaxIters(3);
       reorderer.setMinPartitionSize(8192);
       mp = new BPReorderingMergePolicy(mp, reorderer, 131072);
   ```
   
   Indexing ran in 3402170 msec vs. 2610068 msec without reordering, ie. 30% 
slower. (This is when running with default params, ie. maxBufferedDocs=12119, 
SerialMergeScheduler, LogDocMergePolicy (wrapped within 
BPReorderingMergePolicy), etc.) Search was noticeably faster:
   
   ```
                              TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           PKLookup      279.70      (1.6%)      247.15      
(2.8%)  -11.6% ( -15% -   -7%) 0.000
                           HighTerm      441.90      (7.8%)      414.88      
(4.4%)   -6.1% ( -17% -    6%) 0.002
                     CountOrHighMed       94.53     (16.3%)       92.77     
(15.9%)   -1.9% ( -29% -   36%) 0.715
                    CountOrHighHigh       60.96     (16.4%)       60.03     
(16.4%)   -1.5% ( -29% -   37%) 0.768
                  HighTermMonthSort     4314.65      (2.3%)     4274.31      
(2.8%)   -0.9% (  -5% -    4%) 0.248
                            Respell       64.51      (1.1%)       64.38      
(1.6%)   -0.2% (  -2% -    2%) 0.654
                        CountPhrase        3.54     (11.2%)        3.59      
(8.2%)    1.2% ( -16% -   23%) 0.700
                           Wildcard       71.79      (2.6%)       72.66      
(2.7%)    1.2% (  -3% -    6%) 0.143
                             Fuzzy2       89.57      (0.9%)       90.80      
(1.2%)    1.4% (   0% -    3%) 0.000
                            Prefix3      123.64      (3.4%)      125.34      
(2.8%)    1.4% (  -4% -    7%) 0.159
                          CountTerm    14193.65      (3.6%)    14589.84      
(2.9%)    2.8% (  -3% -    9%) 0.007
                             IntNRQ      289.67      (6.0%)      299.22      
(5.6%)    3.3% (  -7% -   15%) 0.074
                         HighPhrase        5.95      (7.6%)        6.16      
(9.3%)    3.6% ( -12% -   22%) 0.180
                             Fuzzy1      104.33      (0.9%)      108.08      
(1.2%)    3.6% (   1% -    5%) 0.000
                          LowPhrase       17.70      (3.3%)       18.74      
(4.9%)    5.9% (  -2% -   14%) 0.000
                            MedTerm      533.08      (7.8%)      568.40      
(4.5%)    6.6% (  -5% -   20%) 0.001
                         OrHighHigh       56.43      (5.8%)       60.45      
(7.0%)    7.1% (  -5% -   21%) 0.000
                    CountAndHighMed      124.71      (3.2%)      136.61      
(4.5%)    9.5% (   1% -   17%) 0.000
                          OrHighMed      212.88      (4.0%)      233.35      
(5.0%)    9.6% (   0% -   19%) 0.000
                          OrHighLow      604.12      (2.8%)      676.18      
(4.5%)   11.9% (   4% -   19%) 0.000
                         AndHighLow      933.07      (2.3%)     1046.85      
(2.7%)   12.2% (   7% -   17%) 0.000
                            LowTerm      947.45      (6.1%)     1091.11      
(4.7%)   15.2% (   4% -   27%) 0.000
                         AndHighMed      197.62      (3.1%)      232.11      
(3.1%)   17.5% (  10% -   24%) 0.000
                          MedPhrase       42.93      (2.7%)       50.47      
(5.4%)   17.6% (   9% -   26%) 0.000
                        AndHighHigh       52.74      (4.0%)       63.41      
(4.6%)   20.2% (  11% -   30%) 0.000
                   CountAndHighHigh       41.69      (3.5%)       50.60      
(5.6%)   21.4% (  11% -   31%) 0.000
              HighTermDayOfYearSort      444.76      (1.7%)      652.11      
(2.0%)   46.6% (  42% -   51%) 0.000
   ```
   
   I'll look into whether I can reduce the merge-time overhead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to