[ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318412#comment-17318412 ]
Zach Chen commented on LUCENE-9335: ----------------------------------- Hi [~jpountz], thanks for the additional paper! I gave it a read and found it be to a great summary and refresher for BMM & BMW algorithms, in addition to some other interesting ones! I also saw that the paper mentioned BMM was found faster than BMW when there are 5+ clauses in disjunction, which echoed your observation above as well. I explored this a bit in the code, and found one heuristics based approach (using iterator cost similarity to proxy / guesstimate block max score similarity across scorers) that seems to give interesting result: Code changes: {code:java} diff --git a/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java b/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java index bdf085d4669..f71e2c2fdd8 100644 --- a/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java +++ b/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java @@ -238,11 +238,33 @@ final class Boolean2ScorerSupplier extends ScorerSupplier { // // However, as WANDScorer uses more complex algorithm and data structure, we would like to // still use DisjunctionSumScorer to handle exhaustive pure disjunctions, which may be faster - if (scoreMode == ScoreMode.TOP_SCORES || minShouldMatch > 1) { + boolean isPureDisjunction = + subs.get(Occur.FILTER).isEmpty() + && subs.get(Occur.MUST).isEmpty() + && subs.get(Occur.MUST_NOT).isEmpty(); + // top-level boolean term query + boolean allTermScorers = + optionalScorers.stream().allMatch(scorer -> scorer instanceof TermScorer); + + if (isPureDisjunction && allTermScorers && isSimilarCost(optionalScorers) && minShouldMatch <= 1) { + return new DisjunctionMaxScorer(weight, 0.01f, optionalScorers, scoreMode); + } else if (scoreMode == ScoreMode.TOP_SCORES || minShouldMatch > 1) { return new WANDScorer(weight, optionalScorers, minShouldMatch, scoreMode); } else { return new DisjunctionSumScorer(weight, optionalScorers, scoreMode); } } } + + private boolean isSimilarCost(List<Scorer> optionalScorers) { + long minCost = Long.MAX_VALUE; + long maxCost = Long.MIN_VALUE; + for (Scorer scorer : optionalScorers) { + long cost = scorer.iterator().cost(); + minCost = Math.min(minCost, cost); + maxCost = Math.max(maxCost, cost); + } + + return maxCost / minCost < 2; + } } {code} >From this change, I have the following luceneutil benchmark result with source >wikimedium5m, with also script *searchBench.py* modified to not fail when hit >count differs between baseline and candidate: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTermDayOfYearSort 264.97 (16.9%) 248.25 (9.4%) -6.3% ( -27% - 23%) 0.143 OrHighLow 477.81 (5.7%) 456.76 (6.3%) -4.4% ( -15% - 8%) 0.021 Fuzzy1 82.43 (12.9%) 78.99 (11.0%) -4.2% ( -24% - 22%) 0.271 HighTermTitleBDVSort 237.06 (10.7%) 227.86 (8.9%) -3.9% ( -21% - 17%) 0.214 OrNotHighMed 646.77 (4.1%) 631.45 (4.7%) -2.4% ( -10% - 6%) 0.091 HighTerm 1717.41 (5.4%) 1681.25 (5.4%) -2.1% ( -12% - 9%) 0.219 PKLookup 213.65 (4.0%) 210.55 (3.9%) -1.4% ( -9% - 6%) 0.247 MedTerm 1703.74 (4.0%) 1680.75 (7.7%) -1.3% ( -12% - 10%) 0.487 OrHighNotHigh 679.24 (7.0%) 670.77 (7.0%) -1.2% ( -14% - 13%) 0.572 HighTermMonthSort 111.16 (13.4%) 109.81 (13.5%) -1.2% ( -24% - 29%) 0.775 MedSpanNear 144.50 (2.0%) 142.97 (3.8%) -1.1% ( -6% - 4%) 0.268 OrHighNotLow 738.44 (6.6%) 730.70 (5.5%) -1.0% ( -12% - 11%) 0.586 LowPhrase 255.32 (1.7%) 252.66 (3.0%) -1.0% ( -5% - 3%) 0.169 TermDTSort 255.93 (15.2%) 253.43 (13.3%) -1.0% ( -25% - 32%) 0.829 AndHighMed 404.90 (2.4%) 401.82 (3.5%) -0.8% ( -6% - 5%) 0.423 BrowseMonthTaxoFacets 12.90 (2.9%) 12.80 (3.4%) -0.7% ( -6% - 5%) 0.469 LowSpanNear 85.22 (1.5%) 84.71 (2.0%) -0.6% ( -4% - 2%) 0.285 BrowseDateTaxoFacets 10.72 (2.7%) 10.66 (2.8%) -0.6% ( -5% - 5%) 0.523 HighSpanNear 15.64 (1.9%) 15.55 (2.4%) -0.5% ( -4% - 3%) 0.423 Wildcard 135.32 (2.8%) 134.61 (4.0%) -0.5% ( -7% - 6%) 0.631 MedPhrase 86.95 (1.6%) 86.51 (2.7%) -0.5% ( -4% - 3%) 0.470 AndHighLow 772.76 (3.4%) 769.05 (3.9%) -0.5% ( -7% - 7%) 0.680 LowTerm 1572.57 (4.3%) 1565.62 (4.7%) -0.4% ( -9% - 9%) 0.758 BrowseDayOfYearTaxoFacets 10.74 (2.8%) 10.71 (2.9%) -0.3% ( -5% - 5%) 0.713 OrNotHighLow 1212.21 (3.4%) 1208.45 (4.1%) -0.3% ( -7% - 7%) 0.797 Fuzzy2 52.79 (15.7%) 52.64 (11.6%) -0.3% ( -23% - 32%) 0.949 HighIntervalsOrdered 44.22 (1.3%) 44.12 (1.3%) -0.2% ( -2% - 2%) 0.571 HighPhrase 385.28 (2.7%) 384.37 (3.4%) -0.2% ( -6% - 6%) 0.809 Respell 94.82 (2.7%) 94.64 (3.3%) -0.2% ( -5% - 5%) 0.839 OrNotHighHigh 676.82 (5.0%) 675.66 (6.2%) -0.2% ( -10% - 11%) 0.923 MedSloppyPhrase 27.08 (2.9%) 27.07 (3.3%) -0.0% ( -6% - 6%) 0.980 AndHighHigh 44.82 (3.8%) 44.90 (4.7%) 0.2% ( -8% - 9%) 0.888 BrowseDayOfYearSSDVFacets 27.27 (2.8%) 27.33 (1.9%) 0.2% ( -4% - 4%) 0.757 HighSloppyPhrase 87.76 (2.5%) 88.26 (3.4%) 0.6% ( -5% - 6%) 0.542 Prefix3 123.34 (4.1%) 124.08 (3.7%) 0.6% ( -6% - 8%) 0.628 OrHighNotMed 746.91 (5.7%) 752.11 (7.0%) 0.7% ( -11% - 14%) 0.730 LowSloppyPhrase 329.41 (3.2%) 332.29 (2.3%) 0.9% ( -4% - 6%) 0.324 IntNRQ 304.68 (2.6%) 307.51 (2.1%) 0.9% ( -3% - 5%) 0.219 BrowseMonthSSDVFacets 30.50 (5.9%) 30.87 (2.6%) 1.2% ( -6% - 10%) 0.404 OrHighMed 128.85 (2.5%) 134.50 (8.8%) 4.4% ( -6% - 16%) 0.033 OrHighHigh 57.01 (2.5%) 237.95 (22.2%) 317.4% ( 285% - 351%) 0.000 WARNING: cat=OrHighHigh: hit counts differ: 15200+ vs 28513+ WARNING: cat=OrHighMed: hit counts differ: 3700+ vs 6567+ errors occurred: ([], ['query=body:de body:use filter=None sort=None groupField=None hitCount=15200+: wrong hitCount: 15200+ vs 28513+', 'query=body:27 body:throughout filter=None sort=None groupField=None hitCount=3700+: wrong hitCount: 3700+ vs 6567+'], 1.0) {code} I have a couple of observations / questions from the results: # Not sure if this benchmark result is valid, given the hit count differs? Does the difference in hit count indicate any bug, as I would imagine the difference between BMM and BMW should only impact QPS and ordering? # This seems to suggest that when cost of scorers are close, BMM (DisjunctionMaxScorer) might be faster than BMW (WANDScorer) # I used DisjunctionMaxScorer instead of DisjunctionSumScorer in the changes above, since the later one doesn't seems to have block max logic (e.g. DisjunctionSumScorer#advanceShallow uses its base's implementation Scorer#advanceShallow). Should DisjunctionSumScorer be implemented similar to DisjunctionMaxScorer in terms of block max logic as well? > Add a bulk scorer for disjunctions that does dynamic pruning > ------------------------------------------------------------ > > Key: LUCENE-9335 > URL: https://issues.apache.org/jira/browse/LUCENE-9335 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > > Lucene often gets benchmarked against other engines, e.g. against Tantivy and > PISA at [https://tantivy-search.github.io/bench/] or against research > prototypes in Table 1 of > [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. > Given that top-level disjunctions of term queries are commonly used for > benchmarking, it would be nice to optimize this case a bit more, I suspect > that we could make fewer per-document decisions by implementing a BulkScorer > instead of a Scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org