[ 
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318412#comment-17318412
 ] 

Zach Chen commented on LUCENE-9335:
-----------------------------------

Hi [~jpountz], thanks for the additional paper! I gave it a read and found it 
be to a great summary and refresher for BMM & BMW algorithms, in addition to 
some other interesting ones! I also saw that the paper mentioned BMM was found 
faster than BMW when there are 5+ clauses in disjunction, which echoed your 
observation above as well.

I explored this a bit in the code, and found one heuristics based approach 
(using iterator cost similarity to proxy / guesstimate block max score 
similarity across scorers) that seems to give interesting result:

Code changes:

 
{code:java}
diff --git 
a/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java 
b/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
index bdf085d4669..f71e2c2fdd8 100644
--- a/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
+++ b/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
@@ -238,11 +238,33 @@ final class Boolean2ScorerSupplier extends ScorerSupplier 
{
       //
       // However, as WANDScorer uses more complex algorithm and data 
structure, we would like to
       // still use DisjunctionSumScorer to handle exhaustive pure 
disjunctions, which may be faster
-      if (scoreMode == ScoreMode.TOP_SCORES || minShouldMatch > 1) {
+      boolean isPureDisjunction =
+              subs.get(Occur.FILTER).isEmpty()
+                      && subs.get(Occur.MUST).isEmpty()
+                      && subs.get(Occur.MUST_NOT).isEmpty();
+      // top-level boolean term query
+      boolean allTermScorers =
+              optionalScorers.stream().allMatch(scorer -> scorer instanceof 
TermScorer);
+
+      if (isPureDisjunction && allTermScorers && 
isSimilarCost(optionalScorers) && minShouldMatch <= 1) {
+        return new DisjunctionMaxScorer(weight, 0.01f, optionalScorers, 
scoreMode);
+      } else if (scoreMode == ScoreMode.TOP_SCORES || minShouldMatch > 1) {
         return new WANDScorer(weight, optionalScorers, minShouldMatch, 
scoreMode);
       } else {
         return new DisjunctionSumScorer(weight, optionalScorers, scoreMode);
       }
     }
   }
+
+  private boolean isSimilarCost(List<Scorer> optionalScorers) {
+    long minCost = Long.MAX_VALUE;
+    long maxCost = Long.MIN_VALUE;
+    for (Scorer scorer : optionalScorers) {
+      long cost = scorer.iterator().cost();
+      minCost = Math.min(minCost, cost);
+      maxCost = Math.max(maxCost, cost);
+    }
+
+    return maxCost / minCost < 2;
+  }
 }

{code}
 

>From this change, I have the following luceneutil benchmark result with source 
>wikimedium5m, with also script *searchBench.py* modified to not fail when hit 
>count differs between baseline and candidate:
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
   HighTermDayOfYearSort      264.97     (16.9%)      248.25      (9.4%)   
-6.3% ( -27% -   23%) 0.143
               OrHighLow      477.81      (5.7%)      456.76      (6.3%)   
-4.4% ( -15% -    8%) 0.021
                  Fuzzy1       82.43     (12.9%)       78.99     (11.0%)   
-4.2% ( -24% -   22%) 0.271
    HighTermTitleBDVSort      237.06     (10.7%)      227.86      (8.9%)   
-3.9% ( -21% -   17%) 0.214
            OrNotHighMed      646.77      (4.1%)      631.45      (4.7%)   
-2.4% ( -10% -    6%) 0.091
                HighTerm     1717.41      (5.4%)     1681.25      (5.4%)   
-2.1% ( -12% -    9%) 0.219
                PKLookup      213.65      (4.0%)      210.55      (3.9%)   
-1.4% (  -9% -    6%) 0.247
                 MedTerm     1703.74      (4.0%)     1680.75      (7.7%)   
-1.3% ( -12% -   10%) 0.487
           OrHighNotHigh      679.24      (7.0%)      670.77      (7.0%)   
-1.2% ( -14% -   13%) 0.572
       HighTermMonthSort      111.16     (13.4%)      109.81     (13.5%)   
-1.2% ( -24% -   29%) 0.775
             MedSpanNear      144.50      (2.0%)      142.97      (3.8%)   
-1.1% (  -6% -    4%) 0.268
            OrHighNotLow      738.44      (6.6%)      730.70      (5.5%)   
-1.0% ( -12% -   11%) 0.586
               LowPhrase      255.32      (1.7%)      252.66      (3.0%)   
-1.0% (  -5% -    3%) 0.169
              TermDTSort      255.93     (15.2%)      253.43     (13.3%)   
-1.0% ( -25% -   32%) 0.829
              AndHighMed      404.90      (2.4%)      401.82      (3.5%)   
-0.8% (  -6% -    5%) 0.423
   BrowseMonthTaxoFacets       12.90      (2.9%)       12.80      (3.4%)   
-0.7% (  -6% -    5%) 0.469
             LowSpanNear       85.22      (1.5%)       84.71      (2.0%)   
-0.6% (  -4% -    2%) 0.285
    BrowseDateTaxoFacets       10.72      (2.7%)       10.66      (2.8%)   
-0.6% (  -5% -    5%) 0.523
            HighSpanNear       15.64      (1.9%)       15.55      (2.4%)   
-0.5% (  -4% -    3%) 0.423
                Wildcard      135.32      (2.8%)      134.61      (4.0%)   
-0.5% (  -7% -    6%) 0.631
               MedPhrase       86.95      (1.6%)       86.51      (2.7%)   
-0.5% (  -4% -    3%) 0.470
              AndHighLow      772.76      (3.4%)      769.05      (3.9%)   
-0.5% (  -7% -    7%) 0.680
                 LowTerm     1572.57      (4.3%)     1565.62      (4.7%)   
-0.4% (  -9% -    9%) 0.758
BrowseDayOfYearTaxoFacets       10.74      (2.8%)       10.71      (2.9%)   
-0.3% (  -5% -    5%) 0.713
            OrNotHighLow     1212.21      (3.4%)     1208.45      (4.1%)   
-0.3% (  -7% -    7%) 0.797
                  Fuzzy2       52.79     (15.7%)       52.64     (11.6%)   
-0.3% ( -23% -   32%) 0.949
    HighIntervalsOrdered       44.22      (1.3%)       44.12      (1.3%)   
-0.2% (  -2% -    2%) 0.571
              HighPhrase      385.28      (2.7%)      384.37      (3.4%)   
-0.2% (  -6% -    6%) 0.809
                 Respell       94.82      (2.7%)       94.64      (3.3%)   
-0.2% (  -5% -    5%) 0.839
           OrNotHighHigh      676.82      (5.0%)      675.66      (6.2%)   
-0.2% ( -10% -   11%) 0.923
         MedSloppyPhrase       27.08      (2.9%)       27.07      (3.3%)   
-0.0% (  -6% -    6%) 0.980
             AndHighHigh       44.82      (3.8%)       44.90      (4.7%)    
0.2% (  -8% -    9%) 0.888
BrowseDayOfYearSSDVFacets       27.27      (2.8%)       27.33      (1.9%)    
0.2% (  -4% -    4%) 0.757
        HighSloppyPhrase       87.76      (2.5%)       88.26      (3.4%)    
0.6% (  -5% -    6%) 0.542
                 Prefix3      123.34      (4.1%)      124.08      (3.7%)    
0.6% (  -6% -    8%) 0.628
            OrHighNotMed      746.91      (5.7%)      752.11      (7.0%)    
0.7% ( -11% -   14%) 0.730
         LowSloppyPhrase      329.41      (3.2%)      332.29      (2.3%)    
0.9% (  -4% -    6%) 0.324
                  IntNRQ      304.68      (2.6%)      307.51      (2.1%)    
0.9% (  -3% -    5%) 0.219
   BrowseMonthSSDVFacets       30.50      (5.9%)       30.87      (2.6%)    
1.2% (  -6% -   10%) 0.404
               OrHighMed      128.85      (2.5%)      134.50      (8.8%)    
4.4% (  -6% -   16%) 0.033
              OrHighHigh       57.01      (2.5%)      237.95     (22.2%)  
317.4% ( 285% -  351%) 0.000
WARNING: cat=OrHighHigh: hit counts differ: 15200+ vs 28513+
WARNING: cat=OrHighMed: hit counts differ: 3700+ vs 6567+
errors occurred: ([], ['query=body:de body:use filter=None sort=None 
groupField=None hitCount=15200+: wrong hitCount: 15200+ vs 28513+', 
'query=body:27 body:throughout filter=None sort=None groupField=None 
hitCount=3700+: wrong hitCount: 3700+ vs 6567+'], 1.0)


{code}
I have a couple of observations / questions from the results:
 # Not sure if this benchmark result is valid, given the hit count differs? 
Does the difference in hit count indicate any bug, as I would imagine the 
difference between BMM and BMW should only impact QPS and ordering?  
 # This seems to suggest that when cost of scorers are close, BMM 
(DisjunctionMaxScorer) might be faster than BMW (WANDScorer)
 # I used DisjunctionMaxScorer instead of DisjunctionSumScorer in the changes 
above, since the later one doesn't seems to have block max logic (e.g. 
DisjunctionSumScorer#advanceShallow uses its base's implementation 
Scorer#advanceShallow). Should DisjunctionSumScorer be implemented similar to 
DisjunctionMaxScorer in terms of block max logic as well?

 

> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
>                 Key: LUCENE-9335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9335
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and 
> PISA at [https://tantivy-search.github.io/bench/] or against research 
> prototypes in Table 1 of 
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
>  Given that top-level disjunctions of term queries are commonly used for 
> benchmarking, it would be nice to optimize this case a bit more, I suspect 
> that we could make fewer per-document decisions by implementing a BulkScorer 
> instead of a Scorer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to