[ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319923#comment-17319923 ]
Zach Chen commented on LUCENE-9335: ----------------------------------- I made some further changes to move some block max related logic from DisjunctionMaxScorer to DisjunctionScorer, so that DisjunctionSumScorer can inherit. I've published a WIP PR [https://github.com/apache/lucene/pull/81] for those changes for the ease of review. When I run luceneutil, I see further errors from verifyScores section of code, which may indicate bugs in my changes: {code:java} WARNING: cat=OrHighHigh: hit counts differ: 9870+ vs 2616+ Traceback (most recent call last): File "src/python/localrun.py", line 53, in <module> comp.benchmark("baseline_vs_patch") File "/Users/xichen/IdeaProjects/benchmarks/util/src/python/competition.py", line 455, in benchmark randomSeed = self.randomSeed) File "/Users/xichen/IdeaProjects/benchmarks/util/src/python/searchBench.py", line 196, in run raise RuntimeError('errors occurred: %s' % str(cmpDiffs)) RuntimeError: errors occurred: ([], ["query=body:second body:short filter=None sort=None groupField=None hitCount=9870+: hit 0 has wrong field/score value ([1444649], '5.0718417') vs ([5125], '4.224689')"], 1.0){code} I then made further changes in benchUtil.py to skip over verifyScores, so that I can see what benchmark results it would generate: {code:java} diff --git a/src/python/benchUtil.py b/src/python/benchUtil.py index fb50033..c2faffc 100644 --- a/src/python/benchUtil.py +++ b/src/python/benchUtil.py @@ -1203,7 +1203,7 @@ class RunAlgs: cmpRawResults, heapCmp = parseResults(cmpLogFiles) # make sure they got identical results - cmpDiffs = compareHits(baseRawResults, cmpRawResults, self.verifyScores, self.verifyCounts) + cmpDiffs = compareHits(baseRawResults, cmpRawResults, False, False) baseResults = collateResults(baseRawResults) cmpResults = collateResults(cmpRawResults){code} I then got the following benchmark results from multiple runs {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighMed 186.44 (2.8%) 160.50 (4.5%) -13.9% ( -20% - -6%) 0.000 OrHighLow 735.70 (7.5%) 696.89 (4.3%) -5.3% ( -15% - 6%) 0.006 Fuzzy1 75.85 (11.5%) 72.81 (14.0%) -4.0% ( -26% - 24%) 0.323 TermDTSort 237.49 (10.4%) 228.02 (10.6%) -4.0% ( -22% - 18%) 0.230 HighTermMonthSort 280.82 (9.8%) 274.90 (10.8%) -2.1% ( -20% - 20%) 0.518 Fuzzy2 54.08 (12.5%) 53.04 (14.2%) -1.9% ( -25% - 28%) 0.648 OrNotHighMed 672.83 (2.7%) 661.16 (4.7%) -1.7% ( -8% - 5%) 0.153 HighTermTitleBDVSort 438.56 (14.4%) 431.81 (16.6%) -1.5% ( -28% - 34%) 0.754 AndHighLow 969.43 (5.2%) 957.49 (4.7%) -1.2% ( -10% - 9%) 0.432 OrNotHighHigh 704.98 (3.4%) 700.72 (3.9%) -0.6% ( -7% - 7%) 0.605 AndHighHigh 109.77 (4.2%) 109.31 (4.7%) -0.4% ( -9% - 8%) 0.767 BrowseMonthSSDVFacets 32.52 (2.1%) 32.40 (4.6%) -0.4% ( -6% - 6%) 0.755 PKLookup 219.90 (3.1%) 219.16 (3.2%) -0.3% ( -6% - 6%) 0.734 Wildcard 284.84 (1.9%) 284.18 (1.8%) -0.2% ( -3% - 3%) 0.690 Prefix3 361.00 (2.1%) 360.24 (2.0%) -0.2% ( -4% - 4%) 0.750 HighIntervalsOrdered 28.68 (2.2%) 28.64 (1.7%) -0.1% ( -3% - 3%) 0.819 BrowseMonthTaxoFacets 13.60 (2.9%) 13.59 (2.7%) -0.1% ( -5% - 5%) 0.947 BrowseDayOfYearSSDVFacets 28.67 (4.8%) 28.66 (4.8%) -0.0% ( -9% - 10%) 0.979 HighSpanNear 79.29 (2.4%) 79.29 (2.2%) 0.0% ( -4% - 4%) 0.997 OrHighNotHigh 695.37 (5.5%) 696.65 (3.8%) 0.2% ( -8% - 10%) 0.903 MedTerm 1478.47 (3.6%) 1481.54 (3.0%) 0.2% ( -6% - 7%) 0.843 HighTermDayOfYearSort 372.12 (14.1%) 373.08 (14.8%) 0.3% ( -25% - 33%) 0.955 IntNRQ 125.36 (1.3%) 125.72 (0.7%) 0.3% ( -1% - 2%) 0.391 LowSpanNear 52.82 (1.7%) 52.98 (2.0%) 0.3% ( -3% - 4%) 0.611 BrowseDayOfYearTaxoFacets 11.28 (3.1%) 11.31 (3.1%) 0.3% ( -5% - 6%) 0.756 LowSloppyPhrase 154.42 (2.9%) 154.91 (2.9%) 0.3% ( -5% - 6%) 0.731 MedPhrase 143.27 (2.9%) 143.88 (2.5%) 0.4% ( -4% - 6%) 0.625 OrHighNotMed 760.65 (6.8%) 763.93 (5.4%) 0.4% ( -10% - 13%) 0.824 Respell 86.71 (1.5%) 87.11 (2.1%) 0.5% ( -3% - 4%) 0.425 MedSpanNear 210.43 (2.2%) 211.43 (1.4%) 0.5% ( -3% - 4%) 0.414 MedSloppyPhrase 220.29 (2.6%) 221.35 (2.2%) 0.5% ( -4% - 5%) 0.528 BrowseDateTaxoFacets 11.24 (3.0%) 11.30 (3.1%) 0.6% ( -5% - 6%) 0.529 LowPhrase 174.98 (2.4%) 176.05 (2.0%) 0.6% ( -3% - 5%) 0.385 HighSloppyPhrase 100.25 (3.2%) 100.88 (3.0%) 0.6% ( -5% - 7%) 0.524 OrHighNotLow 1016.27 (7.7%) 1025.44 (6.9%) 0.9% ( -12% - 16%) 0.696 LowTerm 1634.20 (3.8%) 1649.20 (2.6%) 0.9% ( -5% - 7%) 0.376 HighPhrase 415.55 (3.0%) 419.76 (2.9%) 1.0% ( -4% - 7%) 0.282 OrNotHighLow 940.18 (5.4%) 952.61 (2.7%) 1.3% ( -6% - 9%) 0.328 HighTerm 1163.01 (3.8%) 1178.47 (4.8%) 1.3% ( -6% - 10%) 0.329 AndHighMed 365.15 (4.4%) 370.53 (3.1%) 1.5% ( -5% - 9%) 0.225 OrHighHigh 80.20 (2.2%) 718.16 (158.9%) 795.4% ( 620% - 978%) 0.000 WARNING: cat=OrHighHigh: hit counts differ: 19022+ vs 1002+ WARNING: cat=OrHighMed: hit counts differ: 4321+ vs 4289+ {code} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighLow 667.55 (4.3%) 649.15 (5.9%) -2.8% ( -12% - 7%) 0.092 OrHighNotHigh 866.11 (5.2%) 851.54 (5.2%) -1.7% ( -11% - 9%) 0.305 HighTermDayOfYearSort 293.23 (16.4%) 288.53 (15.0%) -1.6% ( -28% - 35%) 0.747 Fuzzy1 58.36 (10.3%) 57.55 (10.2%) -1.4% ( -19% - 21%) 0.666 OrHighNotLow 709.78 (3.0%) 702.08 (3.9%) -1.1% ( -7% - 5%) 0.322 LowTerm 1816.04 (4.4%) 1797.54 (4.8%) -1.0% ( -9% - 8%) 0.488 Fuzzy2 50.30 (11.0%) 49.85 (11.7%) -0.9% ( -21% - 24%) 0.802 OrHighHigh 55.81 (2.4%) 55.35 (2.4%) -0.8% ( -5% - 4%) 0.288 HighSpanNear 15.16 (2.9%) 15.07 (3.0%) -0.6% ( -6% - 5%) 0.547 MedSpanNear 67.82 (3.1%) 67.47 (3.5%) -0.5% ( -6% - 6%) 0.613 OrHighMed 195.58 (2.7%) 194.60 (2.6%) -0.5% ( -5% - 4%) 0.548 LowSpanNear 36.88 (2.7%) 36.75 (2.9%) -0.3% ( -5% - 5%) 0.690 BrowseMonthTaxoFacets 13.05 (3.3%) 13.01 (3.5%) -0.3% ( -6% - 6%) 0.749 HighIntervalsOrdered 44.33 (1.3%) 44.18 (1.4%) -0.3% ( -3% - 2%) 0.439 HighSloppyPhrase 17.92 (4.4%) 17.87 (4.3%) -0.3% ( -8% - 8%) 0.821 MedPhrase 78.25 (2.5%) 78.12 (2.1%) -0.2% ( -4% - 4%) 0.823 BrowseDayOfYearSSDVFacets 27.73 (2.4%) 27.68 (2.0%) -0.2% ( -4% - 4%) 0.817 PKLookup 213.40 (2.9%) 213.09 (2.4%) -0.1% ( -5% - 5%) 0.862 AndHighHigh 100.38 (2.9%) 100.25 (2.9%) -0.1% ( -5% - 5%) 0.891 BrowseDayOfYearTaxoFacets 10.79 (3.4%) 10.78 (3.7%) -0.1% ( -6% - 7%) 0.912 AndHighLow 778.66 (3.4%) 778.10 (3.1%) -0.1% ( -6% - 6%) 0.945 Wildcard 141.04 (2.1%) 141.00 (2.4%) -0.0% ( -4% - 4%) 0.970 BrowseDateTaxoFacets 10.77 (3.3%) 10.77 (3.6%) -0.0% ( -6% - 7%) 0.993 HighTermTitleBDVSort 222.22 (12.5%) 222.20 (11.7%) -0.0% ( -21% - 27%) 0.998 LowSloppyPhrase 34.64 (3.6%) 34.64 (3.4%) -0.0% ( -6% - 7%) 0.994 IntNRQ 143.05 (0.6%) 143.25 (0.8%) 0.1% ( -1% - 1%) 0.546 BrowseMonthSSDVFacets 30.95 (5.2%) 31.00 (5.0%) 0.2% ( -9% - 10%) 0.922 Respell 66.20 (2.2%) 66.36 (1.8%) 0.2% ( -3% - 4%) 0.719 AndHighMed 300.10 (2.5%) 300.83 (2.9%) 0.2% ( -5% - 5%) 0.775 LowPhrase 54.92 (2.6%) 55.07 (2.1%) 0.3% ( -4% - 5%) 0.701 MedTerm 1522.58 (3.9%) 1529.48 (3.8%) 0.5% ( -7% - 8%) 0.711 OrNotHighLow 965.50 (5.2%) 972.89 (2.8%) 0.8% ( -6% - 9%) 0.564 HighPhrase 163.18 (2.5%) 164.66 (2.6%) 0.9% ( -4% - 6%) 0.259 HighTerm 1400.74 (3.8%) 1417.05 (3.9%) 1.2% ( -6% - 9%) 0.335 MedSloppyPhrase 146.41 (2.6%) 148.13 (2.8%) 1.2% ( -4% - 6%) 0.171 Prefix3 355.38 (3.1%) 359.62 (3.4%) 1.2% ( -5% - 7%) 0.242 OrNotHighHigh 704.32 (4.2%) 713.26 (4.6%) 1.3% ( -7% - 10%) 0.363 TermDTSort 227.11 (13.3%) 230.00 (13.5%) 1.3% ( -22% - 32%) 0.764 OrHighNotMed 724.29 (4.2%) 736.47 (3.8%) 1.7% ( -6% - 10%) 0.183 HighTermMonthSort 134.94 (10.6%) 137.37 (10.2%) 1.8% ( -17% - 25%) 0.583 OrNotHighMed 764.10 (3.7%) 778.89 (3.4%) 1.9% ( -4% - 9%) 0.082 {code} > Add a bulk scorer for disjunctions that does dynamic pruning > ------------------------------------------------------------ > > Key: LUCENE-9335 > URL: https://issues.apache.org/jira/browse/LUCENE-9335 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Lucene often gets benchmarked against other engines, e.g. against Tantivy and > PISA at [https://tantivy-search.github.io/bench/] or against research > prototypes in Table 1 of > [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. > Given that top-level disjunctions of term queries are commonly used for > benchmarking, it would be nice to optimize this case a bit more, I suspect > that we could make fewer per-document decisions by implementing a BulkScorer > instead of a Scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org