[
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318412#comment-17318412
]
Zach Chen commented on LUCENE-9335:
-----------------------------------
Hi [~jpountz], thanks for the additional paper! I gave it a read and found it
be to a great summary and refresher for BMM & BMW algorithms, in addition to
some other interesting ones! I also saw that the paper mentioned BMM was found
faster than BMW when there are 5+ clauses in disjunction, which echoed your
observation above as well.
I explored this a bit in the code, and found one heuristics based approach
(using iterator cost similarity to proxy / guesstimate block max score
similarity across scorers) that seems to give interesting result:
Code changes:
{code:java}
diff --git
a/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
b/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
index bdf085d4669..f71e2c2fdd8 100644
--- a/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
+++ b/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
@@ -238,11 +238,33 @@ final class Boolean2ScorerSupplier extends ScorerSupplier
{
//
// However, as WANDScorer uses more complex algorithm and data
structure, we would like to
// still use DisjunctionSumScorer to handle exhaustive pure
disjunctions, which may be faster
- if (scoreMode == ScoreMode.TOP_SCORES || minShouldMatch > 1) {
+ boolean isPureDisjunction =
+ subs.get(Occur.FILTER).isEmpty()
+ && subs.get(Occur.MUST).isEmpty()
+ && subs.get(Occur.MUST_NOT).isEmpty();
+ // top-level boolean term query
+ boolean allTermScorers =
+ optionalScorers.stream().allMatch(scorer -> scorer instanceof
TermScorer);
+
+ if (isPureDisjunction && allTermScorers &&
isSimilarCost(optionalScorers) && minShouldMatch <= 1) {
+ return new DisjunctionMaxScorer(weight, 0.01f, optionalScorers,
scoreMode);
+ } else if (scoreMode == ScoreMode.TOP_SCORES || minShouldMatch > 1) {
return new WANDScorer(weight, optionalScorers, minShouldMatch,
scoreMode);
} else {
return new DisjunctionSumScorer(weight, optionalScorers, scoreMode);
}
}
}
+
+ private boolean isSimilarCost(List<Scorer> optionalScorers) {
+ long minCost = Long.MAX_VALUE;
+ long maxCost = Long.MIN_VALUE;
+ for (Scorer scorer : optionalScorers) {
+ long cost = scorer.iterator().cost();
+ minCost = Math.min(minCost, cost);
+ maxCost = Math.max(maxCost, cost);
+ }
+
+ return maxCost / minCost < 2;
+ }
}
{code}
>From this change, I have the following luceneutil benchmark result with source
>wikimedium5m, with also script *searchBench.py* modified to not fail when hit
>count differs between baseline and candidate:
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
HighTermDayOfYearSort 264.97 (16.9%) 248.25 (9.4%)
-6.3% ( -27% - 23%) 0.143
OrHighLow 477.81 (5.7%) 456.76 (6.3%)
-4.4% ( -15% - 8%) 0.021
Fuzzy1 82.43 (12.9%) 78.99 (11.0%)
-4.2% ( -24% - 22%) 0.271
HighTermTitleBDVSort 237.06 (10.7%) 227.86 (8.9%)
-3.9% ( -21% - 17%) 0.214
OrNotHighMed 646.77 (4.1%) 631.45 (4.7%)
-2.4% ( -10% - 6%) 0.091
HighTerm 1717.41 (5.4%) 1681.25 (5.4%)
-2.1% ( -12% - 9%) 0.219
PKLookup 213.65 (4.0%) 210.55 (3.9%)
-1.4% ( -9% - 6%) 0.247
MedTerm 1703.74 (4.0%) 1680.75 (7.7%)
-1.3% ( -12% - 10%) 0.487
OrHighNotHigh 679.24 (7.0%) 670.77 (7.0%)
-1.2% ( -14% - 13%) 0.572
HighTermMonthSort 111.16 (13.4%) 109.81 (13.5%)
-1.2% ( -24% - 29%) 0.775
MedSpanNear 144.50 (2.0%) 142.97 (3.8%)
-1.1% ( -6% - 4%) 0.268
OrHighNotLow 738.44 (6.6%) 730.70 (5.5%)
-1.0% ( -12% - 11%) 0.586
LowPhrase 255.32 (1.7%) 252.66 (3.0%)
-1.0% ( -5% - 3%) 0.169
TermDTSort 255.93 (15.2%) 253.43 (13.3%)
-1.0% ( -25% - 32%) 0.829
AndHighMed 404.90 (2.4%) 401.82 (3.5%)
-0.8% ( -6% - 5%) 0.423
BrowseMonthTaxoFacets 12.90 (2.9%) 12.80 (3.4%)
-0.7% ( -6% - 5%) 0.469
LowSpanNear 85.22 (1.5%) 84.71 (2.0%)
-0.6% ( -4% - 2%) 0.285
BrowseDateTaxoFacets 10.72 (2.7%) 10.66 (2.8%)
-0.6% ( -5% - 5%) 0.523
HighSpanNear 15.64 (1.9%) 15.55 (2.4%)
-0.5% ( -4% - 3%) 0.423
Wildcard 135.32 (2.8%) 134.61 (4.0%)
-0.5% ( -7% - 6%) 0.631
MedPhrase 86.95 (1.6%) 86.51 (2.7%)
-0.5% ( -4% - 3%) 0.470
AndHighLow 772.76 (3.4%) 769.05 (3.9%)
-0.5% ( -7% - 7%) 0.680
LowTerm 1572.57 (4.3%) 1565.62 (4.7%)
-0.4% ( -9% - 9%) 0.758
BrowseDayOfYearTaxoFacets 10.74 (2.8%) 10.71 (2.9%)
-0.3% ( -5% - 5%) 0.713
OrNotHighLow 1212.21 (3.4%) 1208.45 (4.1%)
-0.3% ( -7% - 7%) 0.797
Fuzzy2 52.79 (15.7%) 52.64 (11.6%)
-0.3% ( -23% - 32%) 0.949
HighIntervalsOrdered 44.22 (1.3%) 44.12 (1.3%)
-0.2% ( -2% - 2%) 0.571
HighPhrase 385.28 (2.7%) 384.37 (3.4%)
-0.2% ( -6% - 6%) 0.809
Respell 94.82 (2.7%) 94.64 (3.3%)
-0.2% ( -5% - 5%) 0.839
OrNotHighHigh 676.82 (5.0%) 675.66 (6.2%)
-0.2% ( -10% - 11%) 0.923
MedSloppyPhrase 27.08 (2.9%) 27.07 (3.3%)
-0.0% ( -6% - 6%) 0.980
AndHighHigh 44.82 (3.8%) 44.90 (4.7%)
0.2% ( -8% - 9%) 0.888
BrowseDayOfYearSSDVFacets 27.27 (2.8%) 27.33 (1.9%)
0.2% ( -4% - 4%) 0.757
HighSloppyPhrase 87.76 (2.5%) 88.26 (3.4%)
0.6% ( -5% - 6%) 0.542
Prefix3 123.34 (4.1%) 124.08 (3.7%)
0.6% ( -6% - 8%) 0.628
OrHighNotMed 746.91 (5.7%) 752.11 (7.0%)
0.7% ( -11% - 14%) 0.730
LowSloppyPhrase 329.41 (3.2%) 332.29 (2.3%)
0.9% ( -4% - 6%) 0.324
IntNRQ 304.68 (2.6%) 307.51 (2.1%)
0.9% ( -3% - 5%) 0.219
BrowseMonthSSDVFacets 30.50 (5.9%) 30.87 (2.6%)
1.2% ( -6% - 10%) 0.404
OrHighMed 128.85 (2.5%) 134.50 (8.8%)
4.4% ( -6% - 16%) 0.033
OrHighHigh 57.01 (2.5%) 237.95 (22.2%)
317.4% ( 285% - 351%) 0.000
WARNING: cat=OrHighHigh: hit counts differ: 15200+ vs 28513+
WARNING: cat=OrHighMed: hit counts differ: 3700+ vs 6567+
errors occurred: ([], ['query=body:de body:use filter=None sort=None
groupField=None hitCount=15200+: wrong hitCount: 15200+ vs 28513+',
'query=body:27 body:throughout filter=None sort=None groupField=None
hitCount=3700+: wrong hitCount: 3700+ vs 6567+'], 1.0)
{code}
I have a couple of observations / questions from the results:
# Not sure if this benchmark result is valid, given the hit count differs?
Does the difference in hit count indicate any bug, as I would imagine the
difference between BMM and BMW should only impact QPS and ordering?
# This seems to suggest that when cost of scorers are close, BMM
(DisjunctionMaxScorer) might be faster than BMW (WANDScorer)
# I used DisjunctionMaxScorer instead of DisjunctionSumScorer in the changes
above, since the later one doesn't seems to have block max logic (e.g.
DisjunctionSumScorer#advanceShallow uses its base's implementation
Scorer#advanceShallow). Should DisjunctionSumScorer be implemented similar to
DisjunctionMaxScorer in terms of block max logic as well?
> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
> Key: LUCENE-9335
> URL: https://issues.apache.org/jira/browse/LUCENE-9335
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and
> PISA at [https://tantivy-search.github.io/bench/] or against research
> prototypes in Table 1 of
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
> Given that top-level disjunctions of term queries are commonly used for
> benchmarking, it would be nice to optimize this case a bit more, I suspect
> that we could make fewer per-document decisions by implementing a BulkScorer
> instead of a Scorer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]