[ https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564611#comment-17564611 ]
Zach Chen commented on LUCENE-10480: ------------------------------------ {quote}[AndMedOrHighHigh|https://home.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html] recovered fully but [AndHighOrMedMed|https://home.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html] only a bit. I'm unsure what explains there is still a slowdown compared to BMW. {quote} Hmm this is quite strange. Looks like [AndHighOrMedMed|https://home.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html] was still having about -13% (5 / 38) impact. I just ran the full suite of wikinightly tasks a few times (by copying *wikinightly.tasks* into *wikimedium.10M.nostopwords.tasks* and running *localrun.py* with source *wikimedium10m,* and removing *VectorSearch* queries as they were causing failure NPE for me) but couldn't reproduce the slow down (baseline is using head before all BMM changes): {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value BrowseRandomLabelSSDVFacets 20.83 (3.8%) 20.09 (6.5%) -3.6% ( -13% - 6%) 0.034 BrowseMonthSSDVFacets 30.36 (10.6%) 29.56 (12.7%) -2.7% ( -23% - 23%) 0.473 Prefix3 402.70 (9.3%) 397.59 (9.9%) -1.3% ( -18% - 19%) 0.674 TermDayOfYearSort 183.55 (6.5%) 181.61 (6.9%) -1.1% ( -13% - 13%) 0.617 TermTitleSort 195.99 (7.2%) 194.25 (8.1%) -0.9% ( -15% - 15%) 0.713 PKLookup 293.80 (3.7%) 291.47 (4.8%) -0.8% ( -8% - 7%) 0.555 TermMonthSort 283.86 (7.1%) 281.74 (8.0%) -0.7% ( -14% - 15%) 0.755 Wildcard 227.26 (6.2%) 225.87 (6.4%) -0.6% ( -12% - 12%) 0.759 Term 2227.50 (3.7%) 2219.57 (3.3%) -0.4% ( -7% - 6%) 0.748 Fuzzy1 134.77 (2.8%) 134.37 (2.3%) -0.3% ( -5% - 4%) 0.712 TermGroup100 53.61 (3.7%) 53.47 (4.6%) -0.3% ( -8% - 8%) 0.846 TermDTSort 143.16 (3.2%) 142.89 (3.3%) -0.2% ( -6% - 6%) 0.857 TermBGroup1M1P 79.44 (5.5%) 79.29 (5.5%) -0.2% ( -10% - 11%) 0.917 AndHighHighDayTaxoFacets 45.01 (2.3%) 44.94 (2.1%) -0.1% ( -4% - 4%) 0.833 BrowseRandomLabelTaxoFacets 30.94 (50.0%) 30.92 (46.8%) -0.0% ( -64% - 193%) 0.998 AndHighMedDayTaxoFacets 78.11 (3.2%) 78.11 (3.0%) -0.0% ( -6% - 6%) 0.998 Phrase 202.17 (2.7%) 202.18 (2.0%) 0.0% ( -4% - 4%) 0.996 Fuzzy2 76.10 (2.6%) 76.15 (2.0%) 0.1% ( -4% - 4%) 0.933 TermGroup1M 22.65 (3.8%) 22.67 (3.2%) 0.1% ( -6% - 7%) 0.919 TermDateFacets 32.50 (5.3%) 32.60 (5.5%) 0.3% ( -9% - 11%) 0.861 BrowseDayOfYearSSDVFacets 26.31 (5.9%) 26.39 (8.5%) 0.3% ( -13% - 15%) 0.897 Respell 88.21 (2.2%) 88.49 (2.1%) 0.3% ( -3% - 4%) 0.642 SpanNear 16.14 (4.0%) 16.22 (4.2%) 0.5% ( -7% - 9%) 0.706 MedTermDayTaxoFacets 73.42 (4.8%) 73.85 (4.9%) 0.6% ( -8% - 10%) 0.708 TermBGroup1M 48.92 (4.2%) 49.23 (2.8%) 0.6% ( -6% - 8%) 0.581 IntervalsOrdered 22.42 (5.8%) 22.59 (4.2%) 0.7% ( -8% - 11%) 0.651 OrHighMedDayTaxoFacets 25.27 (6.1%) 25.46 (6.6%) 0.7% ( -11% - 14%) 0.711 TermGroup10K 30.26 (4.2%) 30.50 (2.9%) 0.8% ( -6% - 8%) 0.494 SloppyPhrase 91.40 (5.6%) 92.16 (6.3%) 0.8% ( -10% - 13%) 0.662 IntNRQ 152.74 (20.3%) 154.86 (17.1%) 1.4% ( -29% - 48%) 0.815 AndHighMed 88.55 (2.6%) 89.98 (3.1%) 1.6% ( -3% - 7%) 0.073 AndHighHigh 29.10 (2.7%) 29.68 (3.1%) 2.0% ( -3% - 8%) 0.032 BrowseDayOfYearTaxoFacets 31.29 (40.0%) 31.93 (38.0%) 2.0% ( -54% - 133%) 0.869 BrowseDateTaxoFacets 31.18 (40.3%) 31.87 (38.5%) 2.2% ( -54% - 135%) 0.859 BrowseDateSSDVFacets 3.79 (28.4%) 3.92 (27.9%) 3.4% ( -41% - 83%) 0.700 AndHighOrMedMed 63.04 (6.1%) 65.68 (5.5%) 4.2% ( -7% - 16%) 0.023 AndMedOrHighHigh 92.29 (4.6%) 99.20 (5.5%) 7.5% ( -2% - 18%) 0.000 BrowseMonthTaxoFacets 30.93 (39.4%) 34.36 (43.4%) 11.1% ( -51% - 154%) 0.397 OrHighHigh 20.09 (6.5%) 33.58 (8.7%) 67.2% ( 48% - 88%) 0.000 OrHighMed 78.61 (5.4%) 186.58 (10.7%) 137.4% ( 115% - 162%) 0.000 {code} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value TermBGroup1M1P 81.36 (7.8%) 78.61 (5.5%) -3.4% ( -15% - 10%) 0.114 BrowseMonthSSDVFacets 29.39 (13.1%) 28.50 (13.3%) -3.0% ( -26% - 26%) 0.469 BrowseDateSSDVFacets 3.91 (27.0%) 3.81 (27.5%) -2.5% ( -44% - 71%) 0.768 AndHighOrMedMed 108.53 (6.6%) 106.50 (5.7%) -1.9% ( -13% - 11%) 0.336 OrHighMedDayTaxoFacets 23.14 (4.4%) 22.93 (6.5%) -0.9% ( -11% - 10%) 0.596 TermGroup100 64.69 (4.4%) 64.13 (3.4%) -0.9% ( -8% - 7%) 0.492 TermDayOfYearSort 142.82 (5.3%) 141.72 (2.7%) -0.8% ( -8% - 7%) 0.562 SloppyPhrase 3.10 (4.2%) 3.08 (4.6%) -0.7% ( -9% - 8%) 0.629 Phrase 35.56 (2.5%) 35.36 (2.4%) -0.6% ( -5% - 4%) 0.467 SpanNear 13.52 (3.7%) 13.45 (3.3%) -0.5% ( -7% - 6%) 0.667 Prefix3 395.12 (9.1%) 393.74 (10.6%) -0.3% ( -18% - 21%) 0.911 TermMonthSort 192.42 (9.5%) 191.95 (7.1%) -0.2% ( -15% - 18%) 0.926 Term 3216.34 (3.6%) 3208.51 (3.8%) -0.2% ( -7% - 7%) 0.833 TermTitleSort 278.44 (9.5%) 277.85 (7.1%) -0.2% ( -15% - 18%) 0.936 Respell 89.07 (2.1%) 88.98 (2.6%) -0.1% ( -4% - 4%) 0.885 Fuzzy1 127.07 (1.9%) 127.23 (2.8%) 0.1% ( -4% - 4%) 0.874 BrowseRandomLabelSSDVFacets 20.41 (9.5%) 20.44 (8.6%) 0.2% ( -16% - 20%) 0.954 Wildcard 366.66 (6.1%) 367.33 (6.1%) 0.2% ( -11% - 13%) 0.925 PKLookup 291.94 (4.4%) 292.59 (2.9%) 0.2% ( -6% - 7%) 0.849 IntNRQ 351.10 (1.2%) 351.89 (1.1%) 0.2% ( -2% - 2%) 0.540 TermGroup10K 22.73 (3.5%) 22.81 (3.5%) 0.4% ( -6% - 7%) 0.731 AndHighHigh 49.25 (4.1%) 49.45 (4.6%) 0.4% ( -7% - 9%) 0.770 Fuzzy2 136.67 (2.0%) 137.33 (2.5%) 0.5% ( -3% - 5%) 0.497 MedTermDayTaxoFacets 75.39 (3.4%) 75.79 (2.8%) 0.5% ( -5% - 7%) 0.591 AndHighMedDayTaxoFacets 135.26 (2.6%) 136.01 (2.1%) 0.6% ( -3% - 5%) 0.449 AndHighHighDayTaxoFacets 11.44 (2.4%) 11.50 (1.9%) 0.6% ( -3% - 4%) 0.386 IntervalsOrdered 13.19 (2.6%) 13.27 (2.8%) 0.6% ( -4% - 6%) 0.456 TermDateFacets 32.59 (3.8%) 32.81 (3.1%) 0.7% ( -5% - 7%) 0.526 AndHighMed 109.27 (4.6%) 110.09 (5.6%) 0.7% ( -9% - 11%) 0.648 AndMedOrHighHigh 67.43 (6.2%) 68.02 (6.2%) 0.9% ( -10% - 14%) 0.654 TermGroup1M 26.00 (2.9%) 26.26 (3.4%) 1.0% ( -5% - 7%) 0.310 BrowseDayOfYearSSDVFacets 27.08 (10.0%) 27.36 (14.4%) 1.0% ( -21% - 28%) 0.792 TermBGroup1M 37.57 (3.2%) 38.00 (4.0%) 1.1% ( -5% - 8%) 0.318 TermDTSort 141.09 (2.6%) 143.58 (6.5%) 1.8% ( -7% - 11%) 0.259 BrowseMonthTaxoFacets 28.19 (37.9%) 29.70 (40.8%) 5.3% ( -53% - 135%) 0.669 BrowseDayOfYearTaxoFacets 29.32 (37.6%) 31.00 (43.4%) 5.7% ( -54% - 138%) 0.656 BrowseDateTaxoFacets 29.18 (37.9%) 30.94 (43.8%) 6.0% ( -54% - 141%) 0.641 BrowseRandomLabelTaxoFacets 28.43 (47.1%) 30.67 (55.1%) 7.9% ( -64% - 207%) 0.627 OrHighHigh 19.75 (5.8%) 28.41 (6.0%) 43.9% ( 30% - 59%) 0.000 OrHighMed 78.52 (6.7%) 181.93 (10.9%) 131.7% ( 106% - 159%) 0.000 {code} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value BrowseMonthSSDVFacets 28.38 (11.8%) 27.82 (10.5%) -2.0% ( -21% - 22%) 0.573 PKLookup 296.46 (2.0%) 290.88 (2.9%) -1.9% ( -6% - 3%) 0.016 TermDayOfYearSort 214.70 (7.4%) 210.70 (4.1%) -1.9% ( -12% - 10%) 0.323 TermBGroup1M 29.42 (4.0%) 28.91 (5.3%) -1.7% ( -10% - 7%) 0.236 TermGroup1M 23.08 (3.4%) 22.70 (4.3%) -1.7% ( -9% - 6%) 0.170 TermDTSort 345.43 (6.4%) 339.68 (4.9%) -1.7% ( -12% - 10%) 0.354 TermGroup10K 30.91 (3.3%) 30.44 (4.5%) -1.5% ( -9% - 6%) 0.220 TermDateFacets 47.40 (4.8%) 46.81 (3.8%) -1.3% ( -9% - 7%) 0.362 TermBGroup1M1P 81.44 (6.7%) 80.48 (7.5%) -1.2% ( -14% - 13%) 0.601 BrowseRandomLabelSSDVFacets 20.26 (7.9%) 20.02 (7.8%) -1.2% ( -15% - 15%) 0.637 MedTermDayTaxoFacets 75.68 (4.2%) 74.84 (3.4%) -1.1% ( -8% - 6%) 0.357 BrowseRandomLabelTaxoFacets 38.85 (44.7%) 38.42 (46.1%) -1.1% ( -63% - 162%) 0.940 TermGroup100 41.49 (3.7%) 41.05 (4.5%) -1.0% ( -8% - 7%) 0.419 BrowseDateTaxoFacets 37.84 (36.3%) 37.51 (38.4%) -0.9% ( -55% - 115%) 0.941 BrowseDayOfYearTaxoFacets 37.88 (36.2%) 37.60 (38.1%) -0.7% ( -55% - 115%) 0.950 AndHighHighDayTaxoFacets 7.05 (3.3%) 7.00 (3.8%) -0.7% ( -7% - 6%) 0.533 SloppyPhrase 93.42 (7.8%) 93.27 (6.9%) -0.2% ( -13% - 15%) 0.942 BrowseDateSSDVFacets 3.81 (28.9%) 3.80 (28.5%) -0.1% ( -44% - 80%) 0.993 Phrase 44.60 (2.9%) 44.68 (2.8%) 0.2% ( -5% - 6%) 0.840 SpanNear 27.76 (3.1%) 27.81 (2.8%) 0.2% ( -5% - 6%) 0.830 TermTitleSort 224.37 (7.2%) 225.08 (7.8%) 0.3% ( -13% - 16%) 0.895 TermMonthSort 277.86 (7.2%) 279.21 (7.9%) 0.5% ( -13% - 16%) 0.838 IntNRQ 1286.28 (3.0%) 1292.89 (2.0%) 0.5% ( -4% - 5%) 0.525 Term 2602.76 (3.0%) 2616.13 (3.7%) 0.5% ( -6% - 7%) 0.630 AndHighMedDayTaxoFacets 78.64 (3.2%) 79.12 (3.0%) 0.6% ( -5% - 7%) 0.540 Wildcard 375.54 (5.9%) 378.24 (3.9%) 0.7% ( -8% - 11%) 0.649 OrHighMedDayTaxoFacets 25.37 (7.9%) 25.56 (5.4%) 0.7% ( -11% - 15%) 0.728 AndHighOrMedMed 107.73 (5.2%) 108.60 (3.9%) 0.8% ( -7% - 10%) 0.572 Respell 108.71 (1.1%) 109.74 (2.2%) 0.9% ( -2% - 4%) 0.087 BrowseDayOfYearSSDVFacets 27.55 (10.9%) 27.82 (13.0%) 1.0% ( -20% - 27%) 0.797 AndHighMed 110.51 (4.3%) 111.60 (3.7%) 1.0% ( -6% - 9%) 0.441 Fuzzy1 133.81 (1.2%) 135.34 (1.9%) 1.1% ( -1% - 4%) 0.025 AndHighHigh 119.20 (3.7%) 120.59 (3.4%) 1.2% ( -5% - 8%) 0.302 Fuzzy2 78.92 (1.4%) 80.08 (2.0%) 1.5% ( -1% - 4%) 0.008 IntervalsOrdered 22.54 (4.5%) 22.90 (3.8%) 1.6% ( -6% - 10%) 0.226 BrowseMonthTaxoFacets 33.99 (38.5%) 35.09 (37.9%) 3.2% ( -52% - 129%) 0.788 Prefix3 410.98 (8.6%) 425.40 (5.9%) 3.5% ( -10% - 19%) 0.131 AndMedOrHighHigh 67.29 (3.7%) 69.77 (4.3%) 3.7% ( -4% - 12%) 0.003 OrHighHigh 19.57 (5.3%) 28.41 (5.6%) 45.2% ( 32% - 59%) 0.000 OrHighMed 95.08 (4.9%) 271.09 (10.1%) 185.1% ( 162% - 210%) 0.000 {code} Also my *localconstants* file & java version for reference {code:java} BASE_DIR = '/Users/xichen/IdeaProjects/benchmarks' BENCH_BASE_DIR = '/Users/xichen/IdeaProjects/benchmarks/util' WIKI_BIG_DOCS_LINE_FILE = '%s/data/enwiki-20130102-lines.txt' % BASE_DIR WIKI_BIG_DOCS_COUNT = 6647577 INDEX_NUM_THREADS = 10 # SEARCH_NUM_THREADS = 6 topN=100 {code} {code:java} xichen@MacBook-Pro util % java --version java 17.0.2 2022-01-18 LTS Java(TM) SE Runtime Environment (build 17.0.2+8-LTS-86) Java HotSpot(TM) 64-Bit Server VM (build 17.0.2+8-LTS-86, mixed mode, sharing) {code} Maybe the nightly benchmark is using another suite of tests or the JVM setting matters? I'll see if I can run the original nightly benchmark code / tests from my machine to see if there's any difference. > Specialize 2-clauses disjunctions > --------------------------------- > > Key: LUCENE-10480 > URL: https://issues.apache.org/jira/browse/LUCENE-10480 > Project: Lucene - Core > Issue Type: Task > Reporter: Adrien Grand > Priority: Minor > Time Spent: 7h 20m > Remaining Estimate: 0h > > WANDScorer is nice, but it also has lots of overhead to maintain its > invariants: one linked list for the current candidates, one priority queue of > scorers that are behind, another one for scorers that are ahead. All this > could be simplified in the 2-clauses case, which feels worth specializing for > as it's very common that end users enter queries that only have two terms? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org