[
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565261#comment-17565261
]
Zach Chen commented on LUCENE-10480:
------------------------------------
{quote}Another thing that changes performance sometimes is the doc ID order,
were you using multiple indexing threads maybe?
{quote}
Ok this is actually the case for me. I was previously using 10 threads to index
(INDEX_NUM_THREADS = 10) , and after I commented that out and reindexed with
default setting, I was able to reproduce the slowdown:
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
AndHighOrMedMed 91.27 (4.3%) 85.52
(4.3%) -6.3% ( -14% - 2%) 0.000
PKLookup 333.25 (4.3%) 329.48
(3.8%) -1.1% ( -8% - 7%) 0.380
AndHighHigh 104.25 (2.9%) 103.11
(3.0%) -1.1% ( -6% - 5%) 0.247
SpanNear 16.52 (3.8%) 16.36
(3.1%) -0.9% ( -7% - 6%) 0.396
TermGroup10K 23.99 (3.3%) 23.78
(3.0%) -0.9% ( -6% - 5%) 0.384
Phrase 234.74 (2.7%) 232.71
(1.8%) -0.9% ( -5% - 3%) 0.235
AndHighMed 163.80 (3.5%) 162.42
(4.3%) -0.8% ( -8% - 7%) 0.496
TermBGroup1M 48.02 (3.5%) 47.65
(3.7%) -0.8% ( -7% - 6%) 0.496
SloppyPhrase 4.82 (3.4%) 4.78
(2.7%) -0.7% ( -6% - 5%) 0.460
TermGroup100 41.90 (3.9%) 41.63
(3.3%) -0.7% ( -7% - 6%) 0.569
Term 2680.42 (4.7%) 2664.05
(3.3%) -0.6% ( -8% - 7%) 0.632
TermGroup1M 39.95 (2.9%) 39.71
(3.2%) -0.6% ( -6% - 5%) 0.531
TermBGroup1M1P 84.21 (6.1%) 83.82
(5.7%) -0.5% ( -11% - 12%) 0.801
Respell 113.78 (1.9%) 113.44
(1.7%) -0.3% ( -3% - 3%) 0.603
BrowseRandomLabelSSDVFacets 20.75 (8.2%) 20.74
(10.3%) -0.0% ( -17% - 20%) 0.989
Fuzzy2 83.12 (1.8%) 83.11
(1.1%) -0.0% ( -2% - 2%) 0.976
BrowseDayOfYearSSDVFacets 26.69 (12.0%) 26.70
(11.6%) 0.0% ( -21% - 26%) 0.995
Wildcard 115.84 (5.1%) 115.96
(5.8%) 0.1% ( -10% - 11%) 0.951
TermDayOfYearSort 260.70 (5.4%) 260.99
(2.8%) 0.1% ( -7% - 8%) 0.937
AndHighMedDayTaxoFacets 136.32 (2.6%) 136.63
(2.3%) 0.2% ( -4% - 5%) 0.773
IntervalsOrdered 128.13 (7.5%) 128.45
(7.7%) 0.3% ( -13% - 16%) 0.916
AndHighHighDayTaxoFacets 13.82 (2.8%) 13.87
(2.6%) 0.4% ( -4% - 5%) 0.657
Fuzzy1 79.16 (2.7%) 79.60
(1.8%) 0.6% ( -3% - 5%) 0.433
TermMonthSort 360.17 (6.4%) 362.83
(7.1%) 0.7% ( -11% - 15%) 0.728
TermTitleSort 191.21 (6.8%) 192.70
(7.1%) 0.8% ( -12% - 15%) 0.723
TermDTSort 208.40 (2.9%) 210.39
(2.9%) 1.0% ( -4% - 7%) 0.301
MedTermDayTaxoFacets 78.66 (5.2%) 79.59
(4.4%) 1.2% ( -7% - 11%) 0.436
TermDateFacets 41.04 (5.4%) 41.61
(4.7%) 1.4% ( -8% - 12%) 0.385
IntNRQ 122.00 (8.1%) 124.08
(8.3%) 1.7% ( -13% - 19%) 0.513
OrHighMedDayTaxoFacets 23.16 (8.4%) 23.71
(4.9%) 2.4% ( -10% - 17%) 0.272
BrowseMonthSSDVFacets 28.68 (13.8%) 29.55
(16.8%) 3.0% ( -24% - 39%) 0.531
BrowseDayOfYearTaxoFacets 30.40 (32.2%) 31.67
(34.2%) 4.2% ( -47% - 103%) 0.690
BrowseDateTaxoFacets 30.26 (32.2%) 31.57
(34.4%) 4.3% ( -47% - 104%) 0.680
Prefix3 402.14 (8.6%) 419.96
(8.9%) 4.4% ( -12% - 23%) 0.109
AndMedOrHighHigh 94.79 (4.0%) 99.03
(4.5%) 4.5% ( -3% - 13%) 0.001
BrowseRandomLabelTaxoFacets 32.45 (49.2%) 35.05
(53.4%) 8.0% ( -63% - 217%) 0.622
BrowseMonthTaxoFacets 28.68 (35.3%) 31.37
(39.1%) 9.4% ( -48% - 129%) 0.425
BrowseDateSSDVFacets 3.96 (28.1%) 4.54
(26.3%) 14.7% ( -31% - 96%) 0.089
OrHighHigh 116.10 (3.5%) 156.34
(7.4%) 34.7% ( 22% - 47%) 0.000
OrHighMed 120.07 (3.8%) 238.81
(5.3%) 98.9% ( 86% - 112%) 0.000 {code}
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
BrowseMonthTaxoFacets 28.92 (36.4%) 27.02
(32.8%) -6.6% ( -55% - 98%) 0.548
OrHighMedDayTaxoFacets 4.46 (4.4%) 4.30
(7.7%) -3.6% ( -14% - 8%) 0.072
AndHighOrMedMed 113.85 (5.3%) 110.94
(4.6%) -2.5% ( -11% - 7%) 0.102
AndHighMed 126.02 (3.4%) 123.47
(3.7%) -2.0% ( -8% - 5%) 0.072
TermBGroup1M1P 62.98 (6.2%) 61.72
(5.8%) -2.0% ( -13% - 10%) 0.293
BrowseRandomLabelSSDVFacets 20.94 (5.6%) 20.60
(6.7%) -1.6% ( -13% - 11%) 0.402
TermGroup100 41.54 (3.8%) 41.00
(3.1%) -1.3% ( -7% - 5%) 0.237
MedTermDayTaxoFacets 78.99 (3.3%) 78.06
(4.4%) -1.2% ( -8% - 6%) 0.342
AndHighHighDayTaxoFacets 7.08 (3.4%) 7.00
(3.3%) -1.1% ( -7% - 5%) 0.295
TermDateFacets 57.17 (3.6%) 56.57
(4.6%) -1.0% ( -8% - 7%) 0.426
TermDTSort 340.16 (4.3%) 336.88
(2.7%) -1.0% ( -7% - 6%) 0.396
Phrase 116.48 (4.5%) 115.36
(4.4%) -1.0% ( -9% - 8%) 0.497
BrowseMonthSSDVFacets 29.80 (10.9%) 29.51
(11.8%) -0.9% ( -21% - 24%) 0.792
TermBGroup1M 30.20 (3.9%) 29.94
(4.2%) -0.9% ( -8% - 7%) 0.490
AndHighHigh 132.26 (3.2%) 131.10
(3.3%) -0.9% ( -7% - 5%) 0.394
TermGroup1M 39.70 (2.9%) 39.38
(3.9%) -0.8% ( -7% - 6%) 0.445
SpanNear 168.65 (3.2%) 167.49
(2.3%) -0.7% ( -6% - 5%) 0.438
TermGroup10K 43.11 (3.5%) 43.01
(4.3%) -0.2% ( -7% - 7%) 0.853
Term 3172.83 (2.7%) 3168.67
(3.1%) -0.1% ( -5% - 5%) 0.887
TermTitleSort 218.63 (3.1%) 218.36
(2.7%) -0.1% ( -5% - 5%) 0.892
TermMonthSort 353.25 (3.0%) 353.58
(2.6%) 0.1% ( -5% - 5%) 0.917
IntNRQ 1208.96 (2.0%) 1210.20
(2.5%) 0.1% ( -4% - 4%) 0.887
BrowseDateTaxoFacets 27.09 (26.8%) 27.15
(29.3%) 0.2% ( -44% - 76%) 0.981
AndHighMedDayTaxoFacets 95.98 (3.0%) 96.25
(2.9%) 0.3% ( -5% - 6%) 0.771
BrowseDayOfYearTaxoFacets 27.16 (26.8%) 27.26
(29.4%) 0.3% ( -44% - 77%) 0.969
BrowseDayOfYearSSDVFacets 26.55 (5.3%) 26.70
(9.2%) 0.6% ( -13% - 15%) 0.811
PKLookup 326.57 (5.1%) 328.96
(4.4%) 0.7% ( -8% - 10%) 0.627
IntervalsOrdered 10.66 (3.3%) 10.75
(3.9%) 0.9% ( -6% - 8%) 0.457
Fuzzy2 145.01 (2.0%) 146.28
(2.6%) 0.9% ( -3% - 5%) 0.225
Respell 112.65 (2.1%) 113.64
(3.1%) 0.9% ( -4% - 6%) 0.299
Fuzzy1 134.04 (1.8%) 135.48
(3.0%) 1.1% ( -3% - 5%) 0.171
SloppyPhrase 13.24 (3.9%) 13.43
(4.0%) 1.4% ( -6% - 9%) 0.263
Wildcard 235.26 (5.1%) 239.03
(4.7%) 1.6% ( -7% - 11%) 0.299
TermDayOfYearSort 142.00 (3.3%) 145.14
(6.9%) 2.2% ( -7% - 12%) 0.198
Prefix3 86.50 (7.2%) 88.51
(6.4%) 2.3% ( -10% - 17%) 0.281
AndMedOrHighHigh 96.75 (3.9%) 99.91
(4.2%) 3.3% ( -4% - 11%) 0.011
BrowseRandomLabelTaxoFacets 27.01 (42.1%) 27.91
(49.0%) 3.3% ( -61% - 163%) 0.819
OrHighHigh 21.33 (6.6%) 23.52
(3.1%) 10.3% ( 0% - 21%) 0.000
BrowseDateSSDVFacets 3.74 (27.2%) 4.43
(29.9%) 18.4% ( -30% - 103%) 0.042
OrHighMed 105.91 (4.8%) 178.83
(9.8%) 68.9% ( 51% - 87%) 0.000 {code}
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
BrowseDateSSDVFacets 4.14 (28.1%) 3.60
(28.4%) -13.1% ( -54% - 60%) 0.143
BrowseRandomLabelSSDVFacets 21.06 (9.7%) 20.33
(11.1%) -3.4% ( -22% - 19%) 0.295
TermBGroup1M1P 55.35 (6.7%) 53.62
(6.2%) -3.1% ( -15% - 10%) 0.124
TermDTSort 212.36 (6.4%) 207.44
(3.0%) -2.3% ( -11% - 7%) 0.145
AndHighOrMedMed 124.19 (5.3%) 121.49
(4.5%) -2.2% ( -11% - 8%) 0.161
TermDateFacets 73.81 (4.0%) 72.37
(4.0%) -2.0% ( -9% - 6%) 0.124
MedTermDayTaxoFacets 82.87 (4.0%) 81.30
(4.1%) -1.9% ( -9% - 6%) 0.140
TermMonthSort 225.69 (9.2%) 222.07
(7.1%) -1.6% ( -16% - 16%) 0.538
TermTitleSort 225.68 (9.0%) 222.16
(7.3%) -1.6% ( -16% - 16%) 0.548
TermGroup100 41.75 (3.0%) 41.16
(2.8%) -1.4% ( -7% - 4%) 0.130
IntNRQ 89.84 (7.1%) 88.65
(9.7%) -1.3% ( -16% - 16%) 0.621
TermBGroup1M 39.21 (3.7%) 38.75
(3.3%) -1.2% ( -7% - 6%) 0.289
Phrase 115.01 (5.9%) 113.72
(6.2%) -1.1% ( -12% - 11%) 0.558
BrowseRandomLabelTaxoFacets 31.99 (48.4%) 31.68
(46.7%) -1.0% ( -64% - 182%) 0.950
TermGroup10K 23.54 (3.2%) 23.33
(2.7%) -0.9% ( -6% - 5%) 0.347
Term 2742.88 (3.5%) 2723.92
(3.3%) -0.7% ( -7% - 6%) 0.521
SloppyPhrase 13.33 (1.9%) 13.25
(2.6%) -0.6% ( -5% - 4%) 0.415
AndHighHighDayTaxoFacets 38.27 (2.4%) 38.05
(1.6%) -0.6% ( -4% - 3%) 0.373
BrowseDayOfYearTaxoFacets 30.28 (33.7%) 30.12
(33.7%) -0.5% ( -50% - 100%) 0.961
BrowseDateTaxoFacets 30.19 (33.7%) 30.06
(33.8%) -0.4% ( -50% - 101%) 0.968
TermGroup1M 40.47 (3.8%) 40.34
(3.4%) -0.3% ( -7% - 7%) 0.774
AndHighMedDayTaxoFacets 49.03 (2.5%) 48.88
(2.3%) -0.3% ( -5% - 4%) 0.699
AndHighMed 166.12 (5.2%) 165.86
(5.6%) -0.2% ( -10% - 11%) 0.928
BrowseMonthSSDVFacets 28.25 (10.1%) 28.21
(12.9%) -0.1% ( -21% - 25%) 0.968
Prefix3 465.74 (6.7%) 466.51
(5.2%) 0.2% ( -11% - 12%) 0.930
IntervalsOrdered 23.37 (4.5%) 23.43
(4.3%) 0.3% ( -8% - 9%) 0.853
AndHighHigh 130.93 (3.8%) 131.44
(4.2%) 0.4% ( -7% - 8%) 0.755
Wildcard 165.26 (6.3%) 165.93
(4.9%) 0.4% ( -10% - 12%) 0.819
SpanNear 28.93 (3.6%) 29.22
(3.1%) 1.0% ( -5% - 7%) 0.336
Fuzzy1 162.85 (2.8%) 165.51
(4.2%) 1.6% ( -5% - 8%) 0.147
OrHighMedDayTaxoFacets 15.23 (8.5%) 15.49
(9.1%) 1.7% ( -14% - 21%) 0.538
Fuzzy2 144.23 (3.2%) 146.75
(3.9%) 1.7% ( -5% - 9%) 0.119
BrowseDayOfYearSSDVFacets 26.63 (9.7%) 27.13
(13.8%) 1.9% ( -19% - 28%) 0.616
PKLookup 324.80 (3.5%) 331.01
(3.9%) 1.9% ( -5% - 9%) 0.103
TermDayOfYearSort 143.15 (5.8%) 145.89
(7.1%) 1.9% ( -10% - 15%) 0.351
BrowseMonthTaxoFacets 30.39 (35.7%) 30.99
(36.5%) 2.0% ( -51% - 115%) 0.863
Respell 111.15 (3.7%) 114.29
(5.1%) 2.8% ( -5% - 12%) 0.045
AndMedOrHighHigh 95.45 (4.3%) 100.22
(5.2%) 5.0% ( -4% - 15%) 0.001
OrHighHigh 25.86 (6.1%) 38.74
(5.6%) 49.8% ( 35% - 65%) 0.000
OrHighMed 124.45 (6.6%) 240.13
(6.5%) 93.0% ( 74% - 113%) 0.000 {code}
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
BrowseDateSSDVFacets 4.34 (34.8%) 3.94
(32.7%) -9.4% ( -57% - 89%) 0.378
BrowseMonthTaxoFacets 33.54 (30.9%) 31.18
(31.2%) -7.0% ( -52% - 79%) 0.475
AndHighOrMedMed 92.17 (4.5%) 86.48
(3.8%) -6.2% ( -13% - 2%) 0.000
IntNRQ 124.38 (7.3%) 122.08
(8.8%) -1.9% ( -16% - 15%) 0.471
TermDTSort 264.88 (3.9%) 260.90
(2.7%) -1.5% ( -7% - 5%) 0.153
TermTitleSort 276.74 (4.0%) 272.63
(2.5%) -1.5% ( -7% - 5%) 0.159
BrowseMonthSSDVFacets 29.01 (13.6%) 28.60
(12.0%) -1.4% ( -23% - 27%) 0.725
TermMonthSort 222.92 (3.9%) 220.25
(2.6%) -1.2% ( -7% - 5%) 0.252
MedTermDayTaxoFacets 79.12 (3.4%) 78.33
(4.1%) -1.0% ( -8% - 6%) 0.401
AndHighHigh 103.30 (2.7%) 102.29
(2.8%) -1.0% ( -6% - 4%) 0.258
Fuzzy2 124.60 (2.9%) 123.47
(2.2%) -0.9% ( -5% - 4%) 0.260
TermDateFacets 34.41 (4.0%) 34.11
(5.0%) -0.9% ( -9% - 8%) 0.538
Fuzzy1 135.75 (2.3%) 134.66
(2.0%) -0.8% ( -5% - 3%) 0.240
SloppyPhrase 3.11 (5.0%) 3.08
(4.3%) -0.8% ( -9% - 8%) 0.594
TermGroup100 36.45 (3.4%) 36.19
(4.1%) -0.7% ( -7% - 7%) 0.547
BrowseRandomLabelTaxoFacets 33.28 (46.8%) 33.06
(46.6%) -0.7% ( -64% - 174%) 0.964
Phrase 113.36 (4.2%) 112.65
(3.9%) -0.6% ( -8% - 7%) 0.623
BrowseDayOfYearTaxoFacets 31.53 (32.2%) 31.36
(32.3%) -0.5% ( -49% - 94%) 0.958
OrHighMedDayTaxoFacets 13.78 (4.6%) 13.72
(3.8%) -0.5% ( -8% - 8%) 0.705
Respell 97.91 (2.4%) 97.42
(2.2%) -0.5% ( -5% - 4%) 0.496
Prefix3 458.74 (7.4%) 456.64
(8.0%) -0.5% ( -14% - 16%) 0.851
BrowseDateTaxoFacets 31.40 (32.4%) 31.26
(32.3%) -0.5% ( -49% - 94%) 0.964
AndHighMed 123.44 (3.6%) 122.93
(3.0%) -0.4% ( -6% - 6%) 0.695
BrowseRandomLabelSSDVFacets 20.80 (8.7%) 20.73
(9.0%) -0.3% ( -16% - 19%) 0.914
TermDayOfYearSort 147.07 (5.5%) 146.66
(7.0%) -0.3% ( -12% - 12%) 0.889
IntervalsOrdered 10.67 (4.2%) 10.64
(3.4%) -0.3% ( -7% - 7%) 0.820
AndHighMedDayTaxoFacets 217.34 (1.9%) 216.78
(1.8%) -0.3% ( -3% - 3%) 0.661
AndHighHighDayTaxoFacets 13.89 (2.4%) 13.87
(2.7%) -0.1% ( -5% - 5%) 0.861
TermGroup1M 23.81 (2.5%) 23.79
(4.0%) -0.1% ( -6% - 6%) 0.920
Term 2926.73 (3.7%) 2924.25
(3.8%) -0.1% ( -7% - 7%) 0.942
TermBGroup1M 53.67 (2.3%) 53.63
(3.9%) -0.1% ( -6% - 6%) 0.945
TermGroup10K 29.55 (2.4%) 29.54
(4.1%) -0.0% ( -6% - 6%) 0.977
TermBGroup1M1P 45.34 (6.3%) 45.33
(8.2%) -0.0% ( -13% - 15%) 0.992
Wildcard 114.51 (4.9%) 115.73
(5.5%) 1.1% ( -8% - 12%) 0.519
SpanNear 29.14 (3.0%) 29.46
(2.4%) 1.1% ( -4% - 6%) 0.184
PKLookup 333.07 (4.7%) 336.88
(2.7%) 1.1% ( -5% - 8%) 0.342
BrowseDayOfYearSSDVFacets 27.13 (13.1%) 27.48
(11.9%) 1.3% ( -20% - 30%) 0.746
AndMedOrHighHigh 89.60 (3.7%) 95.84
(3.4%) 7.0% ( 0% - 14%) 0.000
OrHighHigh 21.21 (5.1%) 23.62
(5.0%) 11.3% ( 1% - 22%) 0.000
OrHighMed 122.68 (4.1%) 242.95
(7.1%) 98.0% ( 83% - 113%) 0.000 {code}
{quote}if I try to think of the main differences between WANDScorer and
BlockMaxMaxscoreScorer for AndHighOrMedMed, I think the main one is the way
that {{advanceShallow}} is computed. Conjunctions use block boundaries of the
clause that has the lowest cost, so this could explain why we are seeing a
slowdown with AndHighOrMedMed (since the conjunction uses block boundaries of
OrMedMed) and not AndMedOrHighHigh (since the conjunction uses block boundaries
of Med). Maybe we could explore other approaches for {{advanceShallow}} such as
taking the minimum block boundary across essential clauses only instead of all
clauses.
{quote}
Ah this is interesting to know! I guess I can open another ticket to explore
this improvement further? Do you think this slowdown to AndHighOrMedMed may be
considered as blocker to 9.3 release?
> Specialize 2-clauses disjunctions
> ---------------------------------
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
> Priority: Minor
> Time Spent: 7h 20m
> Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its
> invariants: one linked list for the current candidates, one priority queue of
> scorers that are behind, another one for scorers that are ahead. All this
> could be simplified in the 2-clauses case, which feels worth specializing for
> as it's very common that end users enter queries that only have two terms?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]