[ https://issues.apache.org/jira/browse/LUCENE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562786#comment-17562786 ]
Greg Miller commented on LUCENE-10639: -------------------------------------- As a quick update, I ran benchmarks with just [livedoc checking broken out|https://github.com/gsmiller/lucene/commit/f4e9614a299523b57c854a3bd3371253f0a7fb17] in {{DefaultBulkScorer}}. I surprisingly didn't see any difference. So maybe something else going on here? Note that I ran this with {{wikimedium10m}} instead of {{all}} to get a datapoint a bit quicker: {code:java} TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value Prefix3 118.98 (10.2%) 114.60 (9.9%) -3.7% ( -21% - 18%) 0.247 Wildcard 40.69 (6.9%) 39.62 (7.2%) -2.6% ( -15% - 12%) 0.236 TermDTSort 17.76 (20.4%) 17.33 (14.2%) -2.4% ( -30% - 40%) 0.663 OrNotHighHigh 881.01 (4.4%) 861.34 (3.9%) -2.2% ( -10% - 6%) 0.089 AndHighHigh 8.87 (5.0%) 8.70 (6.2%) -1.8% ( -12% - 9%) 0.296 MedTerm 1771.40 (4.2%) 1740.50 (4.4%) -1.7% ( -9% - 7%) 0.198 AndHighMed 30.59 (4.0%) 30.06 (5.6%) -1.7% ( -10% - 8%) 0.267 OrHighNotLow 782.90 (4.8%) 769.92 (5.1%) -1.7% ( -11% - 8%) 0.291 HighPhrase 392.18 (2.7%) 386.50 (2.7%) -1.4% ( -6% - 4%) 0.087 OrHighNotHigh 830.76 (4.3%) 818.83 (4.3%) -1.4% ( -9% - 7%) 0.295 OrNotHighMed 585.86 (2.6%) 578.07 (3.1%) -1.3% ( -6% - 4%) 0.146 OrHighNotMed 966.75 (3.6%) 956.07 (3.9%) -1.1% ( -8% - 6%) 0.352 LowPhrase 546.02 (2.1%) 540.42 (2.4%) -1.0% ( -5% - 3%) 0.148 MedPhrase 24.65 (2.3%) 24.40 (3.0%) -1.0% ( -6% - 4%) 0.225 AndHighLow 508.37 (3.7%) 503.84 (4.7%) -0.9% ( -8% - 7%) 0.506 OrNotHighLow 672.15 (2.7%) 666.29 (2.8%) -0.9% ( -6% - 4%) 0.313 BrowseMonthTaxoFacets 8.92 (14.5%) 8.84 (13.9%) -0.9% ( -25% - 32%) 0.846 AndHighMedDayTaxoFacets 39.14 (2.2%) 38.82 (2.2%) -0.8% ( -5% - 3%) 0.241 AndHighHighDayTaxoFacets 8.01 (2.8%) 7.96 (2.8%) -0.7% ( -6% - 4%) 0.416 LowSloppyPhrase 5.83 (3.8%) 5.79 (3.8%) -0.7% ( -8% - 7%) 0.556 OrHighLow 128.01 (3.7%) 127.11 (3.8%) -0.7% ( -7% - 7%) 0.554 HighTerm 1190.03 (4.4%) 1183.10 (4.1%) -0.6% ( -8% - 8%) 0.663 MedSloppyPhrase 11.67 (2.1%) 11.61 (2.6%) -0.5% ( -5% - 4%) 0.480 MedTermDayTaxoFacets 14.09 (3.1%) 14.03 (4.1%) -0.5% ( -7% - 6%) 0.686 IntNRQ 110.15 (2.3%) 109.69 (2.1%) -0.4% ( -4% - 4%) 0.546 HighSloppyPhrase 9.56 (4.5%) 9.53 (4.5%) -0.4% ( -8% - 9%) 0.794 BrowseDateSSDVFacets 0.85 (10.4%) 0.85 (10.8%) -0.3% ( -19% - 23%) 0.939 Respell 33.65 (1.7%) 33.58 (1.7%) -0.2% ( -3% - 3%) 0.684 Fuzzy2 74.16 (1.9%) 74.02 (1.7%) -0.2% ( -3% - 3%) 0.740 LowTerm 1522.48 (2.9%) 1520.76 (3.3%) -0.1% ( -6% - 6%) 0.909 LowIntervalsOrdered 12.75 (3.3%) 12.74 (3.3%) -0.1% ( -6% - 6%) 0.915 HighIntervalsOrdered 6.30 (4.2%) 6.31 (4.0%) 0.1% ( -7% - 8%) 0.923 BrowseRandomLabelSSDVFacets 2.57 (4.9%) 2.57 (4.9%) 0.1% ( -9% - 10%) 0.927 Fuzzy1 57.11 (1.9%) 57.26 (1.7%) 0.2% ( -3% - 3%) 0.666 BrowseRandomLabelTaxoFacets 6.32 (9.3%) 6.34 (10.3%) 0.3% ( -17% - 21%) 0.911 LowSpanNear 15.95 (2.9%) 16.01 (2.7%) 0.4% ( -5% - 6%) 0.680 MedIntervalsOrdered 1.61 (5.8%) 1.62 (5.8%) 0.4% ( -10% - 12%) 0.834 HighSpanNear 2.27 (4.2%) 2.28 (4.0%) 0.6% ( -7% - 9%) 0.636 MedSpanNear 8.99 (3.4%) 9.05 (3.3%) 0.7% ( -5% - 7%) 0.502 OrHighMed 60.81 (3.8%) 61.29 (3.3%) 0.8% ( -6% - 8%) 0.479 OrHighHigh 15.25 (4.7%) 15.38 (3.8%) 0.8% ( -7% - 9%) 0.548 HighTermTitleBDVSort 59.77 (18.2%) 60.25 (14.7%) 0.8% ( -27% - 41%) 0.876 OrHighMedDayTaxoFacets 2.42 (3.1%) 2.44 (3.6%) 0.9% ( -5% - 7%) 0.420 BrowseMonthSSDVFacets 4.05 (7.7%) 4.09 (9.4%) 1.0% ( -14% - 19%) 0.717 BrowseDayOfYearSSDVFacets 3.43 (5.5%) 3.46 (4.9%) 1.0% ( -8% - 12%) 0.523 HighTermMonthSort 58.75 (20.4%) 59.43 (14.8%) 1.2% ( -28% - 45%) 0.836 PKLookup 147.01 (2.9%) 148.75 (3.8%) 1.2% ( -5% - 8%) 0.272 HighTermDayOfYearSort 17.59 (17.0%) 17.96 (15.2%) 2.1% ( -25% - 41%) 0.681 BrowseDateTaxoFacets 6.68 (10.4%) 6.84 (13.0%) 2.5% ( -18% - 28%) 0.503 BrowseDayOfYearTaxoFacets 6.68 (10.4%) 6.86 (13.2%) 2.6% ( -18% - 29%) 0.485 {code} > WANDScorer performs better without two-phase > -------------------------------------------- > > Key: LUCENE-10639 > URL: https://issues.apache.org/jira/browse/LUCENE-10639 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Reporter: Greg Miller > Priority: Major > > After looking at the recent improvement [~jpountz] made to WAND scoring in > LUCENE-10634, which does additional work during match confirmation to not > confirm a match who's score wouldn't be competitive, I wanted to see how > performance would shift if we squashed the two-phase iteration completely and > only returned true matches (that were also known to be competitive by score) > in the "approximation" phase. I was a bit surprised to find that luceneutil > benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some > disjunction tasks and doesn't show significant regressions anywhere else. > Note that I used LUCENE-10634 as a baseline, and built my candidate change on > top of that. The diff can be seen here: > [DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231] > A simple conclusion here might be that we shouldn't do two-phase iteration in > WANDScorer, but I'm pretty sure that's not right. I wonder if what's really > going on is that we're under-estimating the cost of confirming a match? Right > now we just return the tail size as the cost. While the cost of confirming a > match is proportional to the tail size, the actual work involved can be quite > significant (having to advance tail iterators to new blocks and decompress > them). I wonder if the WAND second phase is being run too early on > approximate candidates, and if less-expensive, (and even possibly more > restrictive?), second phases could/should be running first? > I'm raising this here as more of a curiosity to see if it sparks ideas on how > to move forward. Again, I'm not proposing we do away with two-phase > iteration, but it seems we might be able to improve things. Maybe I'll > explore changing the cost heuristic next. Also, maybe there's some different > benchmarking that would be useful here that I may not be familiar with? > Benchmark results on wikimediumall: > {code:java} > TaskQPS baseline StdDevQPS candidate > StdDev Pct diff p-value > HighTermTitleBDVSort 22.52 (18.9%) 21.66 > (15.6%) -3.8% ( -32% - 37%) 0.485 > Prefix3 9.38 (9.2%) 9.09 > (10.6%) -3.1% ( -20% - 18%) 0.326 > HighTermMonthSort 25.37 (16.0%) 24.87 > (17.1%) -2.0% ( -30% - 37%) 0.710 > MedTermDayTaxoFacets 9.62 (4.2%) 9.51 > (4.1%) -1.2% ( -9% - 7%) 0.368 > TermDTSort 74.69 (18.0%) 74.13 > (18.2%) -0.7% ( -31% - 43%) 0.897 > HighTermDayOfYearSort 52.64 (16.1%) 52.32 > (15.4%) -0.6% ( -27% - 36%) 0.903 > BrowseMonthTaxoFacets 8.64 (19.1%) 8.59 > (19.8%) -0.6% ( -33% - 47%) 0.926 > BrowseDateSSDVFacets 0.86 (9.5%) 0.86 > (13.1%) -0.4% ( -20% - 24%) 0.914 > PKLookup 147.18 (3.9%) 146.66 > (3.3%) -0.3% ( -7% - 7%) 0.759 > BrowseDayOfYearSSDVFacets 3.47 (4.5%) 3.45 > (4.8%) -0.3% ( -9% - 9%) 0.822 > Wildcard 36.36 (4.4%) 36.26 > (5.2%) -0.3% ( -9% - 9%) 0.866 > BrowseMonthSSDVFacets 4.15 (12.7%) 4.13 > (12.8%) -0.3% ( -22% - 28%) 0.950 > AndHighMedDayTaxoFacets 15.21 (2.7%) 15.18 > (2.9%) -0.2% ( -5% - 5%) 0.819 > Fuzzy1 68.33 (1.8%) 68.22 > (2.0%) -0.2% ( -3% - 3%) 0.783 > OrHighMedDayTaxoFacets 2.90 (4.1%) 2.89 > (4.0%) -0.1% ( -7% - 8%) 0.930 > MedPhrase 52.81 (2.3%) 52.76 > (1.8%) -0.1% ( -4% - 4%) 0.878 > Respell 36.80 (1.9%) 36.78 > (1.9%) -0.1% ( -3% - 3%) 0.933 > Fuzzy2 63.06 (1.9%) 63.05 > (2.1%) -0.0% ( -3% - 4%) 0.971 > LowPhrase 74.60 (1.9%) 74.61 > (1.8%) 0.0% ( -3% - 3%) 0.987 > AndHighHighDayTaxoFacets 4.54 (2.3%) 4.55 > (2.0%) 0.0% ( -4% - 4%) 0.960 > HighPhrase 353.13 (2.6%) 353.28 > (2.5%) 0.0% ( -4% - 5%) 0.958 > OrNotHighHigh 761.72 (4.0%) 762.48 > (3.6%) 0.1% ( -7% - 8%) 0.935 > OrHighNotLow 1129.94 (4.1%) 1131.56 > (3.6%) 0.1% ( -7% - 8%) 0.906 > LowTerm 1315.90 (2.9%) 1318.61 > (2.5%) 0.2% ( -5% - 5%) 0.810 > IntNRQ 192.33 (2.8%) 192.93 > (2.3%) 0.3% ( -4% - 5%) 0.701 > LowSpanNear 23.60 (2.2%) 23.68 > (1.6%) 0.3% ( -3% - 4%) 0.592 > OrNotHighMed 867.21 (2.3%) 870.27 > (2.8%) 0.4% ( -4% - 5%) 0.664 > BrowseRandomLabelSSDVFacets 2.53 (1.6%) 2.54 > (1.9%) 0.4% ( -3% - 3%) 0.494 > AndHighMed 105.33 (4.5%) 105.83 > (4.6%) 0.5% ( -8% - 9%) 0.739 > HighTerm 1030.35 (5.7%) 1035.54 > (5.9%) 0.5% ( -10% - 12%) 0.783 > MedSloppyPhrase 41.07 (3.0%) 41.28 > (2.9%) 0.5% ( -5% - 6%) 0.581 > AndHighLow 287.51 (3.2%) 289.03 > (4.3%) 0.5% ( -6% - 8%) 0.657 > OrHighNotMed 910.71 (3.9%) 915.93 > (4.1%) 0.6% ( -7% - 8%) 0.651 > AndHighHigh 28.96 (5.0%) 29.15 > (5.3%) 0.6% ( -9% - 11%) 0.695 > OrNotHighLow 679.21 (2.7%) 683.68 > (4.1%) 0.7% ( -6% - 7%) 0.551 > MedTerm 1425.49 (4.8%) 1435.41 > (5.1%) 0.7% ( -8% - 11%) 0.657 > MedSpanNear 8.74 (3.0%) 8.80 > (2.8%) 0.7% ( -4% - 6%) 0.448 > BrowseRandomLabelTaxoFacets 6.11 (14.4%) 6.16 > (15.2%) 0.7% ( -25% - 35%) 0.875 > OrHighNotHigh 674.18 (4.1%) 679.40 > (4.5%) 0.8% ( -7% - 9%) 0.569 > LowSloppyPhrase 5.08 (3.3%) 5.12 > (3.5%) 0.8% ( -5% - 7%) 0.445 > HighSpanNear 2.22 (5.4%) 2.25 > (4.2%) 1.3% ( -7% - 11%) 0.398 > HighSloppyPhrase 5.27 (7.8%) 5.34 > (9.0%) 1.3% ( -14% - 19%) 0.622 > LowIntervalsOrdered 17.88 (4.8%) 18.21 > (3.1%) 1.9% ( -5% - 10%) 0.144 > BrowseDateTaxoFacets 6.51 (14.4%) 6.65 > (17.4%) 2.3% ( -25% - 39%) 0.652 > BrowseDayOfYearTaxoFacets 6.52 (14.4%) 6.68 > (17.7%) 2.5% ( -25% - 40%) 0.624 > MedIntervalsOrdered 14.43 (7.8%) 14.80 > (4.5%) 2.6% ( -9% - 16%) 0.205 > OrHighLow 158.48 (3.2%) 162.94 > (4.2%) 2.8% ( -4% - 10%) 0.017 > HighIntervalsOrdered 1.56 (9.4%) 1.60 > (5.2%) 3.0% ( -10% - 19%) 0.215 > OrHighMed 65.32 (4.2%) 71.62 > (4.1%) 9.6% ( 1% - 18%) 0.000 > OrHighHigh 14.04 (4.5%) 15.68 > (3.9%) 11.7% ( 3% - 21%) 0.000 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org