jpountz opened a new pull request, #13605: URL: https://github.com/apache/lucene/pull/13605
It's been pointed multiple times that a difference between Tantivy and Lucene is the fact that Tantivy uses windows of 4,096 docs when Lucene has a 2x smaller window size of 2,048 docs and that this might explain part of the performance difference. luceneutil suggests that bumping the window size to 4,096 does indeed improve performance for counting queries, but not for top-k queries. I'm still suggesting to bump the window size across the board to keep our disjunction scorers consistent. ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CountPhrase 3.27 (11.6%) 3.14 (8.0%) -4.1% ( -21% - 17%) 0.189 HighTermMonthSort 3521.28 (3.5%) 3481.74 (2.8%) -1.1% ( -7% - 5%) 0.262 PKLookup 289.42 (1.3%) 286.47 (2.2%) -1.0% ( -4% - 2%) 0.075 TermDTSort 352.01 (6.5%) 348.89 (5.6%) -0.9% ( -12% - 11%) 0.642 Phrase 11.85 (5.3%) 11.76 (5.0%) -0.8% ( -10% - 9%) 0.634 OrHighLow 772.82 (2.4%) 767.24 (2.1%) -0.7% ( -5% - 3%) 0.313 CountAndHighMed 120.78 (2.3%) 120.10 (2.5%) -0.6% ( -5% - 4%) 0.449 HighTermDayOfYearSort 821.48 (3.5%) 818.62 (2.7%) -0.3% ( -6% - 6%) 0.724 HighTermTitleSort 148.84 (2.9%) 148.33 (2.8%) -0.3% ( -5% - 5%) 0.700 AndHighHigh 62.36 (1.7%) 62.17 (1.8%) -0.3% ( -3% - 3%) 0.584 CountAndHighHigh 41.41 (2.5%) 41.34 (2.6%) -0.2% ( -5% - 5%) 0.836 Fuzzy1 96.24 (1.0%) 96.09 (1.2%) -0.2% ( -2% - 2%) 0.667 AndHighLow 827.59 (2.7%) 826.89 (2.4%) -0.1% ( -5% - 5%) 0.918 AndHighMed 93.35 (1.6%) 93.29 (1.7%) -0.1% ( -3% - 3%) 0.903 HighTermTitleBDVSort 16.30 (4.2%) 16.29 (6.7%) -0.0% ( -10% - 11%) 0.984 OrHighMed 153.42 (2.6%) 153.41 (2.2%) -0.0% ( -4% - 4%) 0.994 Respell 46.72 (1.3%) 46.72 (1.4%) 0.0% ( -2% - 2%) 0.975 And3Terms 155.73 (2.2%) 155.95 (1.4%) 0.1% ( -3% - 3%) 0.805 Fuzzy2 58.66 (0.9%) 58.77 (1.1%) 0.2% ( -1% - 2%) 0.566 OrHighHigh 75.70 (2.6%) 75.90 (2.3%) 0.3% ( -4% - 5%) 0.733 CountTerm 9110.00 (4.3%) 9142.10 (3.2%) 0.4% ( -6% - 8%) 0.768 AndStopWords 29.47 (2.6%) 29.57 (1.3%) 0.4% ( -3% - 4%) 0.579 And2Terms2StopWords 150.30 (2.1%) 150.86 (1.1%) 0.4% ( -2% - 3%) 0.487 OrHighRare 237.33 (5.7%) 238.26 (6.2%) 0.4% ( -10% - 13%) 0.837 MedTerm 553.55 (6.0%) 555.97 (7.7%) 0.4% ( -12% - 15%) 0.841 Wildcard 34.08 (3.2%) 34.25 (3.4%) 0.5% ( -5% - 7%) 0.630 OrNotHighLow 761.70 (3.2%) 766.33 (2.6%) 0.6% ( -5% - 6%) 0.511 Or2Terms2StopWords 156.10 (3.2%) 157.14 (1.8%) 0.7% ( -4% - 5%) 0.416 Or3Terms 156.59 (3.0%) 157.70 (1.9%) 0.7% ( -4% - 5%) 0.374 HighTerm 440.27 (5.6%) 443.89 (7.5%) 0.8% ( -11% - 14%) 0.695 LowTerm 892.27 (5.2%) 900.48 (6.8%) 0.9% ( -10% - 13%) 0.632 OrStopWords 31.88 (4.7%) 32.29 (2.6%) 1.3% ( -5% - 9%) 0.276 Prefix3 214.22 (3.4%) 217.48 (2.8%) 1.5% ( -4% - 8%) 0.124 OrHighNotHigh 247.52 (4.8%) 254.52 (5.1%) 2.8% ( -6% - 13%) 0.071 IntNRQ 144.53 (17.2%) 148.66 (17.9%) 2.9% ( -27% - 45%) 0.607 OrNotHighMed 330.23 (6.5%) 340.12 (5.4%) 3.0% ( -8% - 15%) 0.114 OrHighNotMed 285.11 (5.2%) 293.82 (6.2%) 3.1% ( -7% - 15%) 0.092 OrHighNotLow 429.94 (5.4%) 443.15 (6.8%) 3.1% ( -8% - 16%) 0.113 OrNotHighHigh 189.30 (5.9%) 195.25 (5.4%) 3.1% ( -7% - 15%) 0.079 CountOrHighMed 99.90 (22.5%) 121.78 (20.0%) 21.9% ( -16% - 83%) 0.001 CountOrHighHigh 53.76 (35.1%) 70.24 (32.5%) 30.6% ( -27% - 151%) 0.004 ``` ### Description <!-- If this is your first contribution to Lucene, please make sure you have reviewed the contribution guide. https://github.com/apache/lucene/blob/main/CONTRIBUTING.md --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org