[
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319923#comment-17319923
]
Zach Chen commented on LUCENE-9335:
-----------------------------------
I made some further changes to move some block max related logic from
DisjunctionMaxScorer to DisjunctionScorer, so that DisjunctionSumScorer can
inherit. I've published a WIP PR [https://github.com/apache/lucene/pull/81] for
those changes for the ease of review.
When I run luceneutil, I see further errors from verifyScores section of code,
which may indicate bugs in my changes:
{code:java}
WARNING: cat=OrHighHigh: hit counts differ: 9870+ vs 2616+
Traceback (most recent call last):
File "src/python/localrun.py", line 53, in <module>
comp.benchmark("baseline_vs_patch")
File "/Users/xichen/IdeaProjects/benchmarks/util/src/python/competition.py",
line 455, in benchmark
randomSeed = self.randomSeed)
File "/Users/xichen/IdeaProjects/benchmarks/util/src/python/searchBench.py",
line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ["query=body:second body:short filter=None
sort=None groupField=None hitCount=9870+: hit 0 has wrong field/score value
([1444649], '5.0718417') vs ([5125], '4.224689')"], 1.0){code}
I then made further changes in benchUtil.py to skip over verifyScores, so that
I can see what benchmark results it would generate:
{code:java}
diff --git a/src/python/benchUtil.py b/src/python/benchUtil.py
index fb50033..c2faffc 100644
--- a/src/python/benchUtil.py
+++ b/src/python/benchUtil.py
@@ -1203,7 +1203,7 @@ class RunAlgs:
cmpRawResults, heapCmp = parseResults(cmpLogFiles)
# make sure they got identical results
- cmpDiffs = compareHits(baseRawResults, cmpRawResults, self.verifyScores,
self.verifyCounts)
+ cmpDiffs = compareHits(baseRawResults, cmpRawResults, False, False)
baseResults = collateResults(baseRawResults)
cmpResults = collateResults(cmpRawResults){code}
I then got the following benchmark results from multiple runs
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
OrHighMed 186.44 (2.8%) 160.50 (4.5%)
-13.9% ( -20% - -6%) 0.000
OrHighLow 735.70 (7.5%) 696.89 (4.3%)
-5.3% ( -15% - 6%) 0.006
Fuzzy1 75.85 (11.5%) 72.81 (14.0%)
-4.0% ( -26% - 24%) 0.323
TermDTSort 237.49 (10.4%) 228.02 (10.6%)
-4.0% ( -22% - 18%) 0.230
HighTermMonthSort 280.82 (9.8%) 274.90 (10.8%)
-2.1% ( -20% - 20%) 0.518
Fuzzy2 54.08 (12.5%) 53.04 (14.2%)
-1.9% ( -25% - 28%) 0.648
OrNotHighMed 672.83 (2.7%) 661.16 (4.7%)
-1.7% ( -8% - 5%) 0.153
HighTermTitleBDVSort 438.56 (14.4%) 431.81 (16.6%)
-1.5% ( -28% - 34%) 0.754
AndHighLow 969.43 (5.2%) 957.49 (4.7%)
-1.2% ( -10% - 9%) 0.432
OrNotHighHigh 704.98 (3.4%) 700.72 (3.9%)
-0.6% ( -7% - 7%) 0.605
AndHighHigh 109.77 (4.2%) 109.31 (4.7%)
-0.4% ( -9% - 8%) 0.767
BrowseMonthSSDVFacets 32.52 (2.1%) 32.40 (4.6%)
-0.4% ( -6% - 6%) 0.755
PKLookup 219.90 (3.1%) 219.16 (3.2%)
-0.3% ( -6% - 6%) 0.734
Wildcard 284.84 (1.9%) 284.18 (1.8%)
-0.2% ( -3% - 3%) 0.690
Prefix3 361.00 (2.1%) 360.24 (2.0%)
-0.2% ( -4% - 4%) 0.750
HighIntervalsOrdered 28.68 (2.2%) 28.64 (1.7%)
-0.1% ( -3% - 3%) 0.819
BrowseMonthTaxoFacets 13.60 (2.9%) 13.59 (2.7%)
-0.1% ( -5% - 5%) 0.947
BrowseDayOfYearSSDVFacets 28.67 (4.8%) 28.66 (4.8%)
-0.0% ( -9% - 10%) 0.979
HighSpanNear 79.29 (2.4%) 79.29 (2.2%)
0.0% ( -4% - 4%) 0.997
OrHighNotHigh 695.37 (5.5%) 696.65 (3.8%)
0.2% ( -8% - 10%) 0.903
MedTerm 1478.47 (3.6%) 1481.54 (3.0%)
0.2% ( -6% - 7%) 0.843
HighTermDayOfYearSort 372.12 (14.1%) 373.08 (14.8%)
0.3% ( -25% - 33%) 0.955
IntNRQ 125.36 (1.3%) 125.72 (0.7%)
0.3% ( -1% - 2%) 0.391
LowSpanNear 52.82 (1.7%) 52.98 (2.0%)
0.3% ( -3% - 4%) 0.611
BrowseDayOfYearTaxoFacets 11.28 (3.1%) 11.31 (3.1%)
0.3% ( -5% - 6%) 0.756
LowSloppyPhrase 154.42 (2.9%) 154.91 (2.9%)
0.3% ( -5% - 6%) 0.731
MedPhrase 143.27 (2.9%) 143.88 (2.5%)
0.4% ( -4% - 6%) 0.625
OrHighNotMed 760.65 (6.8%) 763.93 (5.4%)
0.4% ( -10% - 13%) 0.824
Respell 86.71 (1.5%) 87.11 (2.1%)
0.5% ( -3% - 4%) 0.425
MedSpanNear 210.43 (2.2%) 211.43 (1.4%)
0.5% ( -3% - 4%) 0.414
MedSloppyPhrase 220.29 (2.6%) 221.35 (2.2%)
0.5% ( -4% - 5%) 0.528
BrowseDateTaxoFacets 11.24 (3.0%) 11.30 (3.1%)
0.6% ( -5% - 6%) 0.529
LowPhrase 174.98 (2.4%) 176.05 (2.0%)
0.6% ( -3% - 5%) 0.385
HighSloppyPhrase 100.25 (3.2%) 100.88 (3.0%)
0.6% ( -5% - 7%) 0.524
OrHighNotLow 1016.27 (7.7%) 1025.44 (6.9%)
0.9% ( -12% - 16%) 0.696
LowTerm 1634.20 (3.8%) 1649.20 (2.6%)
0.9% ( -5% - 7%) 0.376
HighPhrase 415.55 (3.0%) 419.76 (2.9%)
1.0% ( -4% - 7%) 0.282
OrNotHighLow 940.18 (5.4%) 952.61 (2.7%)
1.3% ( -6% - 9%) 0.328
HighTerm 1163.01 (3.8%) 1178.47 (4.8%)
1.3% ( -6% - 10%) 0.329
AndHighMed 365.15 (4.4%) 370.53 (3.1%)
1.5% ( -5% - 9%) 0.225
OrHighHigh 80.20 (2.2%) 718.16 (158.9%)
795.4% ( 620% - 978%) 0.000
WARNING: cat=OrHighHigh: hit counts differ: 19022+ vs 1002+
WARNING: cat=OrHighMed: hit counts differ: 4321+ vs 4289+
{code}
{code:java}
TaskQPS baseline StdDevQPS my_modified_version StdDev
Pct diff p-value
OrHighLow 667.55 (4.3%) 649.15 (5.9%)
-2.8% ( -12% - 7%) 0.092
OrHighNotHigh 866.11 (5.2%) 851.54 (5.2%)
-1.7% ( -11% - 9%) 0.305
HighTermDayOfYearSort 293.23 (16.4%) 288.53 (15.0%)
-1.6% ( -28% - 35%) 0.747
Fuzzy1 58.36 (10.3%) 57.55 (10.2%)
-1.4% ( -19% - 21%) 0.666
OrHighNotLow 709.78 (3.0%) 702.08 (3.9%)
-1.1% ( -7% - 5%) 0.322
LowTerm 1816.04 (4.4%) 1797.54 (4.8%)
-1.0% ( -9% - 8%) 0.488
Fuzzy2 50.30 (11.0%) 49.85 (11.7%)
-0.9% ( -21% - 24%) 0.802
OrHighHigh 55.81 (2.4%) 55.35 (2.4%)
-0.8% ( -5% - 4%) 0.288
HighSpanNear 15.16 (2.9%) 15.07 (3.0%)
-0.6% ( -6% - 5%) 0.547
MedSpanNear 67.82 (3.1%) 67.47 (3.5%)
-0.5% ( -6% - 6%) 0.613
OrHighMed 195.58 (2.7%) 194.60 (2.6%)
-0.5% ( -5% - 4%) 0.548
LowSpanNear 36.88 (2.7%) 36.75 (2.9%)
-0.3% ( -5% - 5%) 0.690
BrowseMonthTaxoFacets 13.05 (3.3%) 13.01 (3.5%)
-0.3% ( -6% - 6%) 0.749
HighIntervalsOrdered 44.33 (1.3%) 44.18 (1.4%)
-0.3% ( -3% - 2%) 0.439
HighSloppyPhrase 17.92 (4.4%) 17.87 (4.3%)
-0.3% ( -8% - 8%) 0.821
MedPhrase 78.25 (2.5%) 78.12 (2.1%)
-0.2% ( -4% - 4%) 0.823
BrowseDayOfYearSSDVFacets 27.73 (2.4%) 27.68 (2.0%)
-0.2% ( -4% - 4%) 0.817
PKLookup 213.40 (2.9%) 213.09 (2.4%)
-0.1% ( -5% - 5%) 0.862
AndHighHigh 100.38 (2.9%) 100.25 (2.9%)
-0.1% ( -5% - 5%) 0.891
BrowseDayOfYearTaxoFacets 10.79 (3.4%) 10.78 (3.7%)
-0.1% ( -6% - 7%) 0.912
AndHighLow 778.66 (3.4%) 778.10 (3.1%)
-0.1% ( -6% - 6%) 0.945
Wildcard 141.04 (2.1%) 141.00 (2.4%)
-0.0% ( -4% - 4%) 0.970
BrowseDateTaxoFacets 10.77 (3.3%) 10.77 (3.6%)
-0.0% ( -6% - 7%) 0.993
HighTermTitleBDVSort 222.22 (12.5%) 222.20 (11.7%)
-0.0% ( -21% - 27%) 0.998
LowSloppyPhrase 34.64 (3.6%) 34.64 (3.4%)
-0.0% ( -6% - 7%) 0.994
IntNRQ 143.05 (0.6%) 143.25 (0.8%)
0.1% ( -1% - 1%) 0.546
BrowseMonthSSDVFacets 30.95 (5.2%) 31.00 (5.0%)
0.2% ( -9% - 10%) 0.922
Respell 66.20 (2.2%) 66.36 (1.8%)
0.2% ( -3% - 4%) 0.719
AndHighMed 300.10 (2.5%) 300.83 (2.9%)
0.2% ( -5% - 5%) 0.775
LowPhrase 54.92 (2.6%) 55.07 (2.1%)
0.3% ( -4% - 5%) 0.701
MedTerm 1522.58 (3.9%) 1529.48 (3.8%)
0.5% ( -7% - 8%) 0.711
OrNotHighLow 965.50 (5.2%) 972.89 (2.8%)
0.8% ( -6% - 9%) 0.564
HighPhrase 163.18 (2.5%) 164.66 (2.6%)
0.9% ( -4% - 6%) 0.259
HighTerm 1400.74 (3.8%) 1417.05 (3.9%)
1.2% ( -6% - 9%) 0.335
MedSloppyPhrase 146.41 (2.6%) 148.13 (2.8%)
1.2% ( -4% - 6%) 0.171
Prefix3 355.38 (3.1%) 359.62 (3.4%)
1.2% ( -5% - 7%) 0.242
OrNotHighHigh 704.32 (4.2%) 713.26 (4.6%)
1.3% ( -7% - 10%) 0.363
TermDTSort 227.11 (13.3%) 230.00 (13.5%)
1.3% ( -22% - 32%) 0.764
OrHighNotMed 724.29 (4.2%) 736.47 (3.8%)
1.7% ( -6% - 10%) 0.183
HighTermMonthSort 134.94 (10.6%) 137.37 (10.2%)
1.8% ( -17% - 25%) 0.583
OrNotHighMed 764.10 (3.7%) 778.89 (3.4%)
1.9% ( -4% - 9%) 0.082
{code}
> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
> Key: LUCENE-9335
> URL: https://issues.apache.org/jira/browse/LUCENE-9335
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and
> PISA at [https://tantivy-search.github.io/bench/] or against research
> prototypes in Table 1 of
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
> Given that top-level disjunctions of term queries are commonly used for
> benchmarking, it would be nice to optimize this case a bit more, I suspect
> that we could make fewer per-document decisions by implementing a BulkScorer
> instead of a Scorer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]