[ https://issues.apache.org/jira/browse/LUCENE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404419#comment-17404419 ]
Adrien Grand commented on LUCENE-9613: -------------------------------------- I pushed some more specialization that gave the following results on wikimedium10m. {noformat} TaskQPS baseline StdDev QPS patch StdDev Pct diff p-value HighSloppyPhrase 24.30 (6.9%) 23.50 (9.3%) -3.3% ( -18% - 13%) 0.204 MedSloppyPhrase 70.81 (4.4%) 69.37 (5.4%) -2.0% ( -11% - 8%) 0.191 LowSloppyPhrase 30.96 (3.3%) 30.49 (3.6%) -1.5% ( -8% - 5%) 0.163 HighTermTitleBDVSort 19.49 (2.8%) 19.38 (2.6%) -0.6% ( -5% - 4%) 0.503 MedPhrase 322.19 (3.8%) 321.87 (9.1%) -0.1% ( -12% - 13%) 0.964 BrowseDayOfYearTaxoFacets 3.18 (3.7%) 3.18 (3.3%) -0.0% ( -6% - 7%) 0.981 BrowseDateTaxoFacets 3.18 (3.7%) 3.18 (3.2%) 0.1% ( -6% - 7%) 0.952 BrowseMonthTaxoFacets 3.45 (4.7%) 3.46 (4.3%) 0.2% ( -8% - 9%) 0.895 IntNRQ 91.30 (43.5%) 91.51 (43.4%) 0.2% ( -60% - 154%) 0.987 LowIntervalsOrdered 17.60 (6.6%) 17.64 (7.1%) 0.2% ( -12% - 14%) 0.915 AndHighLow 1005.24 (4.1%) 1008.36 (4.0%) 0.3% ( -7% - 8%) 0.808 Prefix3 378.76 (11.9%) 380.28 (10.5%) 0.4% ( -19% - 25%) 0.910 LowPhrase 112.90 (2.8%) 113.37 (3.8%) 0.4% ( -6% - 7%) 0.694 HighSpanNear 51.40 (3.0%) 51.64 (3.0%) 0.5% ( -5% - 6%) 0.621 OrHighNotLow 1445.33 (4.9%) 1456.37 (4.6%) 0.8% ( -8% - 10%) 0.614 MedTerm 2527.24 (6.3%) 2548.62 (4.6%) 0.8% ( -9% - 12%) 0.628 OrNotHighMed 1157.13 (2.7%) 1167.00 (3.3%) 0.9% ( -5% - 7%) 0.370 LowSpanNear 44.09 (2.0%) 44.48 (2.1%) 0.9% ( -3% - 5%) 0.184 MedIntervalsOrdered 10.95 (3.4%) 11.04 (3.5%) 0.9% ( -5% - 8%) 0.420 HighIntervalsOrdered 25.53 (3.6%) 25.77 (4.1%) 1.0% ( -6% - 8%) 0.435 MedSpanNear 109.47 (2.0%) 110.57 (2.7%) 1.0% ( -3% - 5%) 0.183 OrHighNotHigh 1095.98 (4.0%) 1107.45 (3.4%) 1.0% ( -6% - 8%) 0.373 Fuzzy1 212.12 (6.8%) 214.37 (6.3%) 1.1% ( -11% - 15%) 0.609 OrHighHigh 34.88 (4.7%) 35.26 (3.2%) 1.1% ( -6% - 9%) 0.392 OrHighMed 124.51 (4.6%) 125.91 (2.5%) 1.1% ( -5% - 8%) 0.339 Respell 271.84 (3.0%) 274.94 (2.8%) 1.1% ( -4% - 7%) 0.210 OrHighNotMed 1397.92 (4.0%) 1414.46 (3.9%) 1.2% ( -6% - 9%) 0.344 HighPhrase 674.43 (2.1%) 682.48 (4.1%) 1.2% ( -4% - 7%) 0.245 AndHighHigh 53.28 (3.4%) 53.92 (4.0%) 1.2% ( -6% - 8%) 0.308 OrHighLow 477.86 (4.1%) 483.78 (3.5%) 1.2% ( -6% - 9%) 0.308 OrNotHighLow 1223.79 (3.8%) 1239.31 (4.3%) 1.3% ( -6% - 9%) 0.321 AndHighMed 106.80 (3.5%) 108.17 (3.9%) 1.3% ( -5% - 8%) 0.271 LowTerm 2514.56 (5.8%) 2549.48 (6.6%) 1.4% ( -10% - 14%) 0.478 Wildcard 157.42 (3.9%) 159.71 (4.1%) 1.5% ( -6% - 9%) 0.246 OrNotHighHigh 1013.51 (3.2%) 1028.66 (3.7%) 1.5% ( -5% - 8%) 0.176 Fuzzy2 154.94 (8.8%) 157.37 (8.4%) 1.6% ( -14% - 20%) 0.565 HighTerm 1590.75 (4.9%) 1624.88 (4.9%) 2.1% ( -7% - 12%) 0.168 HighTermMonthSort 78.11 (7.6%) 81.58 (9.1%) 4.4% ( -11% - 22%) 0.093 TermDTSort 84.05 (7.2%) 87.88 (7.1%) 4.5% ( -9% - 20%) 0.044 HighTermDayOfYearSort 116.77 (6.1%) 122.33 (6.8%) 4.8% ( -7% - 18%) 0.020 BrowseMonthSSDVFacets 12.98 (3.1%) 14.45 (5.1%) 11.3% ( 2% - 20%) 0.000 BrowseDayOfYearSSDVFacets 12.38 (3.5%) 15.52 (12.7%) 25.3% ( 8% - 43%) 0.000 {noformat} > Create blocks for ords when it helps in Lucene80DocValuesFormat > --------------------------------------------------------------- > > Key: LUCENE-9613 > URL: https://issues.apache.org/jira/browse/LUCENE-9613 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Fix For: main (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently for sorted(-set) values, we always write ords using > log2(valueCount) bits per entry. However in several cases like when the field > is used in the index sort, or if one value is _very_common, splitting into > blocks like we do for numerics would help. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org