jpountz commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1770827026
I did a first indexing run on wikibigall with the following merge policy, which I tried to make as lightweight as possible: ``` BPIndexReorderer reorderer = new BPIndexReorderer(); reorderer.setMinDocFreq(16384); reorderer.setMaxIters(3); reorderer.setMinPartitionSize(8192); mp = new BPReorderingMergePolicy(mp, reorderer, 131072); ``` Indexing ran in 3402170 msec vs. 2610068 msec without reordering, ie. 30% slower. (This is when running with default params, ie. maxBufferedDocs=12119, SerialMergeScheduler, LogDocMergePolicy (wrapped within BPReorderingMergePolicy), etc.) Search was noticeably faster: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 279.70 (1.6%) 247.15 (2.8%) -11.6% ( -15% - -7%) 0.000 HighTerm 441.90 (7.8%) 414.88 (4.4%) -6.1% ( -17% - 6%) 0.002 CountOrHighMed 94.53 (16.3%) 92.77 (15.9%) -1.9% ( -29% - 36%) 0.715 CountOrHighHigh 60.96 (16.4%) 60.03 (16.4%) -1.5% ( -29% - 37%) 0.768 HighTermMonthSort 4314.65 (2.3%) 4274.31 (2.8%) -0.9% ( -5% - 4%) 0.248 Respell 64.51 (1.1%) 64.38 (1.6%) -0.2% ( -2% - 2%) 0.654 CountPhrase 3.54 (11.2%) 3.59 (8.2%) 1.2% ( -16% - 23%) 0.700 Wildcard 71.79 (2.6%) 72.66 (2.7%) 1.2% ( -3% - 6%) 0.143 Fuzzy2 89.57 (0.9%) 90.80 (1.2%) 1.4% ( 0% - 3%) 0.000 Prefix3 123.64 (3.4%) 125.34 (2.8%) 1.4% ( -4% - 7%) 0.159 CountTerm 14193.65 (3.6%) 14589.84 (2.9%) 2.8% ( -3% - 9%) 0.007 IntNRQ 289.67 (6.0%) 299.22 (5.6%) 3.3% ( -7% - 15%) 0.074 HighPhrase 5.95 (7.6%) 6.16 (9.3%) 3.6% ( -12% - 22%) 0.180 Fuzzy1 104.33 (0.9%) 108.08 (1.2%) 3.6% ( 1% - 5%) 0.000 LowPhrase 17.70 (3.3%) 18.74 (4.9%) 5.9% ( -2% - 14%) 0.000 MedTerm 533.08 (7.8%) 568.40 (4.5%) 6.6% ( -5% - 20%) 0.001 OrHighHigh 56.43 (5.8%) 60.45 (7.0%) 7.1% ( -5% - 21%) 0.000 CountAndHighMed 124.71 (3.2%) 136.61 (4.5%) 9.5% ( 1% - 17%) 0.000 OrHighMed 212.88 (4.0%) 233.35 (5.0%) 9.6% ( 0% - 19%) 0.000 OrHighLow 604.12 (2.8%) 676.18 (4.5%) 11.9% ( 4% - 19%) 0.000 AndHighLow 933.07 (2.3%) 1046.85 (2.7%) 12.2% ( 7% - 17%) 0.000 LowTerm 947.45 (6.1%) 1091.11 (4.7%) 15.2% ( 4% - 27%) 0.000 AndHighMed 197.62 (3.1%) 232.11 (3.1%) 17.5% ( 10% - 24%) 0.000 MedPhrase 42.93 (2.7%) 50.47 (5.4%) 17.6% ( 9% - 26%) 0.000 AndHighHigh 52.74 (4.0%) 63.41 (4.6%) 20.2% ( 11% - 30%) 0.000 CountAndHighHigh 41.69 (3.5%) 50.60 (5.6%) 21.4% ( 11% - 31%) 0.000 HighTermDayOfYearSort 444.76 (1.7%) 652.11 (2.0%) 46.6% ( 42% - 51%) 0.000 ``` I'll look into whether I can reduce the merge-time overhead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org