[ https://issues.apache.org/jira/browse/LUCENE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419349#comment-17419349 ]
Greg Miller commented on LUCENE-10062: -------------------------------------- I re-ranĀ {{luceneutil}} benchmarks {{wikimedium10m}} since [~mikemccand] added new faceting tasks (thanks Mike!). Looks like there's a nice improvement on these new faceting tasks as well with this change (and no regressions anywhere else that I see). I was waiting to iterate on my PR until I was able to run these new benchmarking tasks, but it seems like there's enough benefit to this change to pick it back up. {noformat} TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value HighTermDayOfYearSort 70.02 (13.7%) 68.45 (9.7%) -2.2% ( -22% - 24%) 0.551 MedTerm 1300.90 (5.5%) 1275.97 (6.7%) -1.9% ( -13% - 10%) 0.324 HighTerm 1953.46 (5.8%) 1925.79 (7.9%) -1.4% ( -14% - 13%) 0.518 HighTermTitleBDVSort 122.35 (15.6%) 120.86 (14.9%) -1.2% ( -27% - 34%) 0.801 TermDTSort 133.47 (8.7%) 131.86 (7.4%) -1.2% ( -15% - 16%) 0.637 LowTerm 1636.13 (5.5%) 1622.34 (7.4%) -0.8% ( -12% - 12%) 0.682 Prefix3 25.69 (6.0%) 25.48 (6.3%) -0.8% ( -12% - 12%) 0.676 LowSpanNear 118.02 (2.1%) 117.31 (1.8%) -0.6% ( -4% - 3%) 0.326 HighTermMonthSort 140.17 (9.8%) 139.47 (9.9%) -0.5% ( -18% - 21%) 0.872 AndHighHigh 49.17 (3.1%) 48.92 (2.7%) -0.5% ( -6% - 5%) 0.584 HighSpanNear 25.54 (2.7%) 25.41 (2.2%) -0.5% ( -5% - 4%) 0.529 AndHighLow 556.68 (5.8%) 554.80 (5.4%) -0.3% ( -10% - 11%) 0.848 BrowseDayOfYearSSDVFacets 16.53 (2.5%) 16.47 (2.4%) -0.3% ( -5% - 4%) 0.674 IntNRQ 87.76 (2.0%) 87.49 (2.1%) -0.3% ( -4% - 3%) 0.634 MedSpanNear 31.11 (2.2%) 31.04 (1.6%) -0.2% ( -3% - 3%) 0.714 OrNotHighLow 765.10 (4.5%) 763.60 (5.4%) -0.2% ( -9% - 10%) 0.901 MedPhrase 160.05 (3.1%) 159.83 (2.9%) -0.1% ( -5% - 6%) 0.885 HighSloppyPhrase 27.67 (3.1%) 27.64 (3.0%) -0.1% ( -6% - 6%) 0.915 LowPhrase 61.12 (3.2%) 61.05 (3.2%) -0.1% ( -6% - 6%) 0.921 OrHighMed 71.85 (2.9%) 71.82 (2.1%) -0.0% ( -4% - 5%) 0.963 HighPhrase 29.40 (2.3%) 29.39 (2.8%) -0.0% ( -5% - 5%) 0.971 Fuzzy2 32.58 (4.3%) 32.57 (6.1%) -0.0% ( -9% - 10%) 0.992 LowIntervalsOrdered 150.30 (1.9%) 150.28 (1.9%) -0.0% ( -3% - 3%) 0.986 AndHighMed 151.32 (3.9%) 151.31 (4.1%) -0.0% ( -7% - 8%) 0.993 OrHighHigh 23.90 (2.3%) 23.91 (1.9%) 0.0% ( -4% - 4%) 0.970 OrHighNotLow 579.17 (5.1%) 579.35 (6.4%) 0.0% ( -10% - 12%) 0.986 MedIntervalsOrdered 86.93 (1.7%) 86.98 (1.9%) 0.1% ( -3% - 3%) 0.913 OrHighNotHigh 536.17 (5.6%) 536.57 (6.6%) 0.1% ( -11% - 12%) 0.969 OrNotHighHigh 787.07 (6.5%) 787.96 (8.1%) 0.1% ( -13% - 15%) 0.961 OrNotHighMed 687.97 (4.7%) 688.77 (6.9%) 0.1% ( -10% - 12%) 0.950 MedSloppyPhrase 68.62 (2.8%) 68.74 (2.7%) 0.2% ( -5% - 5%) 0.838 LowSloppyPhrase 130.37 (2.6%) 130.62 (2.2%) 0.2% ( -4% - 5%) 0.797 OrHighLow 440.44 (4.1%) 441.33 (4.1%) 0.2% ( -7% - 8%) 0.877 Wildcard 122.01 (5.2%) 122.35 (5.3%) 0.3% ( -9% - 11%) 0.867 HighIntervalsOrdered 14.24 (2.2%) 14.34 (2.1%) 0.6% ( -3% - 5%) 0.350 Respell 52.04 (2.2%) 52.48 (2.0%) 0.8% ( -3% - 5%) 0.209 OrHighNotMed 674.76 (4.8%) 680.97 (8.0%) 0.9% ( -11% - 14%) 0.659 PKLookup 153.45 (4.3%) 155.13 (3.8%) 1.1% ( -6% - 9%) 0.394 Fuzzy1 56.57 (9.1%) 57.76 (6.7%) 2.1% ( -12% - 19%) 0.406 BrowseMonthSSDVFacets 19.59 (10.4%) 20.03 (6.7%) 2.3% ( -13% - 21%) 0.413 AndHighHighDayTaxoFacets 19.22 (1.6%) 22.13 (2.2%) 15.1% ( 11% - 19%) 0.000 AndHighMedDayTaxoFacets 25.62 (1.5%) 29.93 (2.2%) 16.8% ( 12% - 20%) 0.000 MedTermDayTaxoFacets 12.96 (2.2%) 18.99 (3.4%) 46.5% ( 39% - 53%) 0.000 OrHighMedDayTaxoFacets 3.97 (2.0%) 5.81 (4.3%) 46.5% ( 39% - 53%) 0.000 BrowseMonthTaxoFacets 2.59 (10.9%) 11.16 (35.8%) 330.4% ( 255% - 423%) 0.000 BrowseDateTaxoFacets 2.44 (9.7%) 13.12 (51.8%) 438.1% ( 343% - 553%) 0.000 BrowseDayOfYearTaxoFacets 2.44 (9.7%) 13.13 (51.7%) 438.2% ( 343% - 552%) 0.000 {noformat} > Explore using SORTED_NUMERIC doc values to encode taxonomy ordinals for > faceting > -------------------------------------------------------------------------------- > > Key: LUCENE-10062 > URL: https://issues.apache.org/jira/browse/LUCENE-10062 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Greg Miller > Assignee: Greg Miller > Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > We currently encode taxonomy ordinals using varint style packing in a binary > doc values field. I suspect there have been a number of improvements to > SortedNumericDocValues since taxonomy faceting was first introduced, and I > plan to explore replacing the custom binary format we have today with a > SORTED_NUMERIC type dv field instead. > I'll report benchmark results and index size impact here. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org