[ https://issues.apache.org/jira/browse/LUCENE-10033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405670#comment-17405670 ]
weizijun commented on LUCENE-10033: ----------------------------------- hi, [~gsmiller] . Here is the wikimedium10m result: {noformat} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value BrowseMonthSSDVFacets 12.05 (13.9%) 5.32 (2.0%) -55.9% ( -63% - -46%) 0.000 BrowseDayOfYearSSDVFacets 10.80 (13.1%) 5.10 (2.5%) -52.8% ( -60% - -42%) 0.000 TermDTSort 111.11 (13.4%) 109.05 (10.5%) -1.9% ( -22% - 25%) 0.625 HighTerm 927.28 (4.1%) 913.16 (3.0%) -1.5% ( -8% - 5%) 0.184 MedTerm 1043.87 (5.7%) 1029.65 (3.5%) -1.4% ( -9% - 8%) 0.361 Wildcard 248.11 (2.4%) 244.81 (3.2%) -1.3% ( -6% - 4%) 0.136 OrNotHighMed 514.00 (2.7%) 508.38 (2.6%) -1.1% ( -6% - 4%) 0.188 LowTerm 1230.06 (4.3%) 1219.21 (3.4%) -0.9% ( -8% - 7%) 0.475 AndHighHigh 52.82 (4.6%) 52.36 (3.8%) -0.9% ( -8% - 7%) 0.515 HighPhrase 117.84 (2.9%) 117.33 (1.7%) -0.4% ( -4% - 4%) 0.558 MedPhrase 71.85 (2.7%) 71.55 (1.9%) -0.4% ( -4% - 4%) 0.568 OrHighNotMed 504.15 (4.5%) 502.33 (3.0%) -0.4% ( -7% - 7%) 0.764 HighTermMonthSort 138.89 (9.3%) 138.40 (11.8%) -0.4% ( -19% - 22%) 0.916 Prefix3 184.76 (3.5%) 184.20 (2.7%) -0.3% ( -6% - 6%) 0.757 IntNRQ 87.44 (0.8%) 87.25 (0.8%) -0.2% ( -1% - 1%) 0.394 AndHighMed 154.81 (3.1%) 154.48 (2.5%) -0.2% ( -5% - 5%) 0.816 BrowseDayOfYearTaxoFacets 2.35 (4.2%) 2.35 (3.9%) -0.1% ( -7% - 8%) 0.911 AndHighLow 379.69 (3.7%) 379.19 (3.7%) -0.1% ( -7% - 7%) 0.911 BrowseMonthTaxoFacets 2.49 (4.6%) 2.49 (4.3%) -0.1% ( -8% - 9%) 0.928 BrowseDateTaxoFacets 2.35 (4.3%) 2.35 (3.9%) -0.1% ( -7% - 8%) 0.960 OrHighHigh 18.57 (2.5%) 18.56 (1.8%) -0.1% ( -4% - 4%) 0.932 MedIntervalsOrdered 48.37 (4.0%) 48.36 (4.0%) -0.0% ( -7% - 8%) 0.987 HighTermTitleBDVSort 91.07 (10.3%) 91.13 (11.8%) 0.1% ( -20% - 24%) 0.985 HighSloppyPhrase 27.39 (4.5%) 27.42 (3.2%) 0.1% ( -7% - 8%) 0.931 HighIntervalsOrdered 20.94 (3.6%) 20.96 (2.8%) 0.1% ( -6% - 6%) 0.907 OrHighNotHigh 431.17 (3.5%) 431.76 (2.7%) 0.1% ( -5% - 6%) 0.889 MedSloppyPhrase 16.30 (4.7%) 16.33 (3.3%) 0.2% ( -7% - 8%) 0.876 LowIntervalsOrdered 179.07 (3.4%) 179.65 (2.5%) 0.3% ( -5% - 6%) 0.734 LowPhrase 278.39 (2.6%) 279.34 (2.6%) 0.3% ( -4% - 5%) 0.674 OrNotHighHigh 421.04 (4.1%) 422.68 (4.1%) 0.4% ( -7% - 8%) 0.762 HighSpanNear 10.97 (2.6%) 11.01 (2.7%) 0.4% ( -4% - 5%) 0.621 LowSpanNear 32.07 (1.9%) 32.21 (2.0%) 0.4% ( -3% - 4%) 0.490 Fuzzy1 51.86 (7.4%) 52.12 (7.3%) 0.5% ( -13% - 16%) 0.834 OrHighMed 103.63 (2.5%) 104.13 (1.7%) 0.5% ( -3% - 4%) 0.473 LowSloppyPhrase 93.59 (3.3%) 94.13 (2.4%) 0.6% ( -4% - 6%) 0.518 OrNotHighLow 413.02 (3.6%) 415.65 (3.8%) 0.6% ( -6% - 8%) 0.585 OrHighNotLow 514.45 (2.8%) 517.93 (3.7%) 0.7% ( -5% - 7%) 0.516 Respell 50.34 (2.4%) 50.74 (2.3%) 0.8% ( -3% - 5%) 0.281 MedSpanNear 9.20 (4.9%) 9.29 (4.8%) 1.0% ( -8% - 11%) 0.535 OrHighLow 257.35 (4.2%) 260.38 (3.4%) 1.2% ( -6% - 9%) 0.325 Fuzzy2 46.61 (10.2%) 47.26 (8.8%) 1.4% ( -15% - 22%) 0.642 PKLookup 140.43 (2.9%) 142.41 (2.4%) 1.4% ( -3% - 6%) 0.096 HighTermDayOfYearSort 115.09 (12.8%) 116.98 (13.2%) 1.6% ( -21% - 31%) 0.689 {noformat} The performance of the SSDV is lower, other cases seem to have little effect. And the whole result is from the Attachment: [^benchmark-10m] > Encode doc values in smaller blocks of values, like postings > ------------------------------------------------------------ > > Key: LUCENE-10033 > URL: https://issues.apache.org/jira/browse/LUCENE-10033 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: benchmark, benchmark-10m > > Time Spent: 1h > Remaining Estimate: 0h > > This is a follow-up to the discussion on this thread: > https://lists.apache.org/thread.html/r7b757074d5f02874ce3a295b0007dff486bc10d08fb0b5e5a4ba72c5%40%3Cdev.lucene.apache.org%3E. > Our current approach for doc values uses large blocks of 16k values where > values can be decompressed independently, using DirectWriter/DirectReader. > This is a bit inefficient in some cases, e.g. a single outlier can grow the > number of bits per value for the entire block, we can't easily use run-length > compression, etc. Plus, it encourages using a different sub-class for every > compression technique, which puts pressure on the JVM. > We'd like to move to an approach that would be more similar to postings with > smaller blocks (e.g. 128 values) whose values get all decompressed at once > (using SIMD instructions), with skip data within blocks in order to > efficiently skip to arbitrary doc IDs (or maybe still use jump tables as > today's doc values, and as discussed here for postings: > https://lists.apache.org/thread.html/r7c3cb7ab143fd4ecbc05c04064d10ef9fb50c5b4d6479b0f35732677%40%3Cdev.lucene.apache.org%3E). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org