[ https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567268#comment-17567268 ]
Adrien Grand commented on LUCENE-10633: --------------------------------------- I played with a prototype that starts dynamically pruning matches as soon as there are 128 competitive ordinals left or less by pulling postings to iterate over the remaining documents that have competitive values. I still need to think of simplifying the logic and improving tests but the initial benchmarks on wikimedium10m are very encouraging (assuming I didn't get anything wrong): {noformat} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Prefix3 248.74 (6.1%) 242.61 (5.8%) -2.5% ( -13% - 10%) 0.191 BrowseMonthTaxoFacets 27.71 (10.1%) 27.34 (10.6%) -1.3% ( -20% - 21%) 0.682 BrowseDateSSDVFacets 4.99 (10.3%) 4.94 (8.4%) -1.1% ( -17% - 19%) 0.707 BrowseDateTaxoFacets 44.26 (12.2%) 43.97 (13.1%) -0.7% ( -23% - 28%) 0.870 Wildcard 137.61 (3.0%) 136.97 (2.6%) -0.5% ( -5% - 5%) 0.592 BrowseDayOfYearTaxoFacets 45.53 (12.4%) 45.44 (13.4%) -0.2% ( -23% - 29%) 0.963 IntNRQ 198.27 (8.1%) 197.94 (7.4%) -0.2% ( -14% - 16%) 0.946 BrowseRandomLabelSSDVFacets 14.51 (2.2%) 14.49 (2.4%) -0.2% ( -4% - 4%) 0.835 AndHighHighDayTaxoFacets 8.32 (5.1%) 8.31 (5.7%) -0.1% ( -10% - 11%) 0.956 LowSpanNear 46.83 (1.6%) 46.82 (2.0%) -0.0% ( -3% - 3%) 0.990 BrowseRandomLabelTaxoFacets 36.18 (10.5%) 36.18 (12.6%) 0.0% ( -20% - 25%) 0.998 MedTermDayTaxoFacets 73.59 (4.8%) 73.66 (5.7%) 0.1% ( -9% - 11%) 0.954 OrNotHighHigh 1476.08 (5.3%) 1477.58 (3.9%) 0.1% ( -8% - 9%) 0.945 TermDTSort 746.55 (2.4%) 747.70 (1.7%) 0.2% ( -3% - 4%) 0.817 Fuzzy2 96.18 (1.3%) 96.39 (1.4%) 0.2% ( -2% - 2%) 0.617 AndHighMedDayTaxoFacets 154.89 (1.8%) 155.29 (1.6%) 0.3% ( -3% - 3%) 0.629 AndHighMed 378.38 (3.7%) 379.50 (4.4%) 0.3% ( -7% - 8%) 0.817 PKLookup 243.14 (1.9%) 243.99 (1.9%) 0.4% ( -3% - 4%) 0.552 HighPhrase 279.13 (2.1%) 280.21 (1.5%) 0.4% ( -3% - 4%) 0.510 Respell 71.59 (1.5%) 71.87 (1.5%) 0.4% ( -2% - 3%) 0.406 OrHighHigh 66.95 (6.5%) 67.21 (5.7%) 0.4% ( -11% - 13%) 0.837 Fuzzy1 101.53 (1.5%) 101.95 (1.5%) 0.4% ( -2% - 3%) 0.382 LowPhrase 101.76 (2.3%) 102.22 (2.6%) 0.5% ( -4% - 5%) 0.558 LowSloppyPhrase 21.14 (3.1%) 21.25 (4.1%) 0.5% ( -6% - 7%) 0.661 MedPhrase 173.45 (2.7%) 174.55 (2.6%) 0.6% ( -4% - 6%) 0.443 MedSpanNear 17.77 (4.5%) 17.88 (4.8%) 0.6% ( -8% - 10%) 0.661 OrHighNotLow 1396.26 (5.6%) 1406.85 (6.4%) 0.8% ( -10% - 13%) 0.692 OrHighMed 162.41 (5.3%) 163.69 (4.8%) 0.8% ( -8% - 11%) 0.625 HighTermDayOfYearSort 1476.11 (2.7%) 1488.26 (2.4%) 0.8% ( -4% - 6%) 0.312 MedIntervalsOrdered 113.65 (4.2%) 114.59 (7.0%) 0.8% ( -9% - 12%) 0.652 OrHighLow 828.13 (5.2%) 835.45 (4.7%) 0.9% ( -8% - 11%) 0.574 MedTerm 2356.21 (4.7%) 2377.47 (5.0%) 0.9% ( -8% - 11%) 0.554 MedSloppyPhrase 62.13 (3.4%) 62.72 (3.9%) 0.9% ( -6% - 8%) 0.420 HighIntervalsOrdered 18.19 (5.7%) 18.37 (8.6%) 1.0% ( -12% - 16%) 0.673 AndHighHigh 54.46 (6.2%) 55.01 (6.3%) 1.0% ( -10% - 14%) 0.615 LowTerm 2247.13 (4.7%) 2270.19 (3.7%) 1.0% ( -7% - 9%) 0.446 OrNotHighLow 1728.71 (4.3%) 1748.19 (4.7%) 1.1% ( -7% - 10%) 0.427 HighTermTitleBDVSort 14.31 (3.3%) 14.47 (5.7%) 1.2% ( -7% - 10%) 0.429 OrHighNotHigh 1328.26 (5.6%) 1345.40 (5.6%) 1.3% ( -9% - 13%) 0.467 OrHighMedDayTaxoFacets 21.05 (3.4%) 21.32 (6.2%) 1.3% ( -8% - 11%) 0.412 HighSloppyPhrase 13.58 (4.6%) 13.76 (5.2%) 1.3% ( -8% - 11%) 0.396 BrowseDayOfYearSSDVFacets 20.03 (7.4%) 20.30 (10.3%) 1.3% ( -15% - 20%) 0.640 HighTerm 1696.02 (7.0%) 1720.12 (6.3%) 1.4% ( -11% - 15%) 0.500 LowIntervalsOrdered 5.49 (4.9%) 5.57 (5.3%) 1.5% ( -8% - 12%) 0.359 OrHighNotMed 2042.56 (5.5%) 2075.38 (6.0%) 1.6% ( -9% - 13%) 0.378 AndHighLow 1604.98 (3.7%) 1632.93 (3.2%) 1.7% ( -5% - 9%) 0.115 BrowseMonthSSDVFacets 22.20 (10.1%) 22.61 (12.0%) 1.9% ( -18% - 26%) 0.596 OrNotHighMed 1440.64 (4.3%) 1467.73 (2.6%) 1.9% ( -4% - 9%) 0.093 HighSpanNear 23.27 (6.2%) 24.09 (6.2%) 3.5% ( -8% - 16%) 0.071 HighTermMonthSort 173.72 (15.7%) 3968.96 (90.7%) 2184.7% (1795% - 2719%) 0.000 HighTermTitleSort 17.70 (14.4%) 1383.03 (288.2%) 7712.7% (6474% - 9368%) 0.000 {noformat} > Dynamic pruning for queries sorted by SORTED(_SET) field > -------------------------------------------------------- > > Key: LUCENE-10633 > URL: https://issues.apache.org/jira/browse/LUCENE-10633 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > > LUCENE-9280 introduced the ability to dynamically prune non-competitive hits > when sorting by a numeric field, by leveraging the points index to skip > documents that do not compare better than the top of the priority queue > maintained by the field comparator. > However queries sorted by a SORTED(_SET) field still look at all hits, which > is disappointing. Could we leverage the terms index to skip hits? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org