[
https://issues.apache.org/jira/browse/LUCENE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497616#comment-17497616
]
Greg Miller commented on LUCENE-10438:
--------------------------------------
I experimented with this a bit for taxo- and ssdv-faceting but didn't get
particularly far. I quickly discovered that {{luceneutil}} doesn't seem to
exercise the {{Facets#getSpecificValue}} code path, which is where I think the
optimization opportunity might be. To do this though, I had to defer counting
to an "on demand" approach instead of counting during initialization. The good
news is that this change doesn't seem to have regressed the existing benchmark
tasks (see below).
I think the next steps here are to augment {{luceneutil}} to exercise
{{getSpecificValue}} so we can measure impact. I'll see if I can find some time
to poke into that, but if anyone else is interested in getting involved, feel
free to jump in!
{code:java}
TaskQPS baseline StdDevQPS candidate
StdDev Pct diff p-value
BrowseMonthSSDVFacets 16.42 (27.7%) 15.16
(24.5%) -7.7% ( -46% - 61%) 0.354
OrHighMedDayTaxoFacets 6.38 (6.7%) 6.28
(6.4%) -1.5% ( -13% - 12%) 0.463
TermDTSort 93.64 (12.5%) 92.45
(11.9%) -1.3% ( -22% - 26%) 0.742
HighTermTitleBDVSort 142.12 (14.2%) 140.36
(13.0%) -1.2% ( -24% - 30%) 0.773
MedTermDayTaxoFacets 38.39 (4.2%) 37.92
(4.1%) -1.2% ( -9% - 7%) 0.356
OrHighHigh 42.40 (4.6%) 42.04
(3.5%) -0.9% ( -8% - 7%) 0.510
HighTermMonthSort 104.42 (18.0%) 103.57
(17.0%) -0.8% ( -30% - 41%) 0.882
Prefix3 270.23 (7.9%) 268.54
(11.0%) -0.6% ( -18% - 19%) 0.837
OrHighMed 79.38 (4.5%) 79.00
(3.6%) -0.5% ( -8% - 7%) 0.709
HighSpanNear 18.50 (2.4%) 18.43
(2.4%) -0.4% ( -5% - 4%) 0.586
IntNRQ 135.21 (0.5%) 134.77
(1.6%) -0.3% ( -2% - 1%) 0.371
OrNotHighLow 1056.43 (2.7%) 1055.39
(3.2%) -0.1% ( -5% - 5%) 0.916
PKLookup 169.34 (3.5%) 169.19
(3.6%) -0.1% ( -6% - 7%) 0.937
AndHighMedDayTaxoFacets 34.87 (1.8%) 34.85
(1.9%) -0.0% ( -3% - 3%) 0.939
OrNotHighMed 930.52 (3.9%) 930.70
(4.0%) 0.0% ( -7% - 8%) 0.988
Wildcard 93.02 (4.9%) 93.05
(6.7%) 0.0% ( -10% - 12%) 0.984
LowTerm 1992.53 (5.1%) 1993.41
(4.3%) 0.0% ( -8% - 9%) 0.976
AndHighHigh 52.14 (4.9%) 52.17
(4.0%) 0.1% ( -8% - 9%) 0.969
HighSloppyPhrase 27.70 (4.0%) 27.72
(3.8%) 0.1% ( -7% - 8%) 0.933
HighTermDayOfYearSort 82.23 (13.3%) 82.35
(14.7%) 0.2% ( -24% - 32%) 0.973
OrNotHighHigh 923.35 (3.6%) 925.08
(4.8%) 0.2% ( -7% - 8%) 0.889
AndHighHighDayTaxoFacets 19.09 (2.3%) 19.16
(1.9%) 0.3% ( -3% - 4%) 0.622
LowSloppyPhrase 28.20 (2.4%) 28.31
(2.6%) 0.4% ( -4% - 5%) 0.624
LowSpanNear 11.96 (3.9%) 12.01
(2.5%) 0.4% ( -5% - 7%) 0.666
LowPhrase 241.84 (4.3%) 242.98
(4.0%) 0.5% ( -7% - 9%) 0.721
MedSpanNear 22.00 (3.3%) 22.11
(2.0%) 0.5% ( -4% - 6%) 0.568
BrowseDayOfYearSSDVFacets 12.00 (15.6%) 12.06
(14.4%) 0.5% ( -25% - 36%) 0.909
MedPhrase 20.64 (4.9%) 20.75
(4.4%) 0.6% ( -8% - 10%) 0.709
Fuzzy2 60.95 (1.7%) 61.29
(1.8%) 0.6% ( -2% - 4%) 0.304
HighPhrase 19.65 (4.8%) 19.77
(4.3%) 0.6% ( -8% - 10%) 0.678
MedSloppyPhrase 30.43 (2.3%) 30.63
(2.3%) 0.7% ( -3% - 5%) 0.354
Fuzzy1 67.61 (1.6%) 68.07
(2.0%) 0.7% ( -2% - 4%) 0.246
OrHighNotMed 1150.70 (3.7%) 1159.51
(3.7%) 0.8% ( -6% - 8%) 0.516
OrHighLow 745.90 (2.9%) 751.76
(1.7%) 0.8% ( -3% - 5%) 0.292
OrHighNotHigh 898.58 (4.1%) 906.01
(4.7%) 0.8% ( -7% - 9%) 0.551
OrHighNotLow 1349.46 (3.4%) 1361.12
(4.0%) 0.9% ( -6% - 8%) 0.463
Respell 46.64 (1.9%) 47.06
(2.0%) 0.9% ( -2% - 4%) 0.152
AndHighMed 164.73 (5.7%) 166.39
(4.4%) 1.0% ( -8% - 11%) 0.531
BrowseDateSSDVFacets 2.40 (7.2%) 2.43
(7.7%) 1.1% ( -12% - 17%) 0.643
BrowseRandomLabelSSDVFacets 9.15 (2.4%) 9.25
(2.6%) 1.1% ( -3% - 6%) 0.150
AndHighLow 873.95 (4.0%) 885.61
(2.2%) 1.3% ( -4% - 7%) 0.192
BrowseMonthTaxoFacets 28.32 (24.4%) 28.70
(24.5%) 1.4% ( -38% - 66%) 0.860
HighIntervalsOrdered 7.13 (4.9%) 7.24
(3.1%) 1.6% ( -6% - 10%) 0.219
LowIntervalsOrdered 116.22 (4.2%) 118.09
(2.7%) 1.6% ( -5% - 8%) 0.148
HighTerm 2742.56 (6.1%) 2794.10
(4.1%) 1.9% ( -7% - 12%) 0.251
MedIntervalsOrdered 55.87 (5.3%) 56.93
(3.4%) 1.9% ( -6% - 11%) 0.173
MedTerm 1679.85 (6.3%) 1718.62
(5.2%) 2.3% ( -8% - 14%) 0.206
BrowseRandomLabelTaxoFacets 18.04 (17.3%) 18.59
(17.9%) 3.0% ( -27% - 46%) 0.585
BrowseDateTaxoFacets 21.38 (21.2%) 22.94
(22.7%) 7.3% ( -30% - 65%) 0.295
BrowseDayOfYearTaxoFacets 21.40 (21.5%) 23.01
(23.2%) 7.5% ( -30% - 66%) 0.286
{code}
> Leverage Weight#count in lucene/facets
> --------------------------------------
>
> Key: LUCENE-10438
> URL: https://issues.apache.org/jira/browse/LUCENE-10438
> Project: Lucene - Core
> Issue Type: Task
> Components: modules/facet
> Reporter: Adrien Grand
> Assignee: Greg Miller
> Priority: Minor
>
> The facet module could leverage Weight#count in order to give fast counts for
> the browsing use-case?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]