[ https://issues.apache.org/jira/browse/LUCENE-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309120#comment-17309120 ]
Greg Miller commented on LUCENE-9850: ------------------------------------- Ok, not as bad with some more optimizations in place (thanks [~jpountz]!), but still a regression. Here's what I'm seeing (still with "-source wikimediumall" as before): {code:java} TaskQPS baseline StdDevQPS pfor 7 exceptions StdDev Pct diff p-value AndHighMed 40.34 (2.6%) 38.83 (2.1%) -3.7% ( -8% - 0%) 0.000 Prefix3 13.95 (1.6%) 13.55 (1.8%) -2.9% ( -6% - 0%) 0.000 OrHighMed 42.52 (2.5%) 41.33 (3.7%) -2.8% ( -8% - 3%) 0.004 OrHighLow 249.72 (3.8%) 242.87 (4.8%) -2.7% ( -10% - 6%) 0.046 AndHighLow 320.47 (3.7%) 311.87 (4.0%) -2.7% ( -10% - 5%) 0.028 LowPhrase 15.24 (2.3%) 14.88 (1.8%) -2.3% ( -6% - 1%) 0.000 OrNotHighLow 459.84 (4.1%) 449.82 (4.2%) -2.2% ( -10% - 6%) 0.094 MedTerm 975.99 (4.2%) 954.87 (4.1%) -2.2% ( -10% - 6%) 0.101 OrNotHighHigh 380.66 (4.2%) 372.45 (5.6%) -2.2% ( -11% - 7%) 0.167 OrHighNotLow 494.46 (4.6%) 484.75 (5.8%) -2.0% ( -11% - 8%) 0.234 Wildcard 57.07 (1.9%) 56.04 (1.4%) -1.8% ( -5% - 1%) 0.001 OrHighNotHigh 422.27 (5.4%) 414.76 (3.8%) -1.8% ( -10% - 7%) 0.227 OrHighHigh 13.69 (1.8%) 13.47 (3.5%) -1.6% ( -6% - 3%) 0.065 LowSloppyPhrase 15.05 (3.5%) 14.82 (4.0%) -1.5% ( -8% - 6%) 0.199 Fuzzy2 22.48 (5.7%) 22.15 (4.7%) -1.5% ( -11% - 9%) 0.376 OrNotHighMed 454.42 (4.8%) 447.81 (5.3%) -1.5% ( -11% - 9%) 0.362 TermDTSort 43.90 (11.6%) 43.27 (10.4%) -1.4% ( -21% - 23%) 0.678 LowSpanNear 4.39 (2.6%) 4.32 (1.9%) -1.4% ( -5% - 3%) 0.050 HighSloppyPhrase 2.77 (3.2%) 2.73 (3.2%) -1.2% ( -7% - 5%) 0.251 HighTermDayOfYearSort 6.33 (13.6%) 6.26 (13.3%) -1.0% ( -24% - 29%) 0.806 HighIntervalsOrdered 1.08 (0.9%) 1.07 (1.2%) -1.0% ( -3% - 1%) 0.003 AndHighHigh 40.58 (3.2%) 40.22 (3.1%) -0.9% ( -6% - 5%) 0.378 HighTerm 792.80 (5.1%) 789.86 (5.0%) -0.4% ( -9% - 10%) 0.816 OrHighNotMed 509.78 (6.4%) 508.18 (5.5%) -0.3% ( -11% - 12%) 0.868 MedSpanNear 4.96 (2.1%) 4.95 (1.5%) -0.3% ( -3% - 3%) 0.666 MedPhrase 81.04 (1.8%) 80.85 (3.0%) -0.2% ( -4% - 4%) 0.763 MedSloppyPhrase 9.10 (3.8%) 9.08 (3.6%) -0.2% ( -7% - 7%) 0.851 IntNRQ 19.09 (0.6%) 19.05 (0.8%) -0.2% ( -1% - 1%) 0.367 HighTermTitleBDVSort 34.87 (11.3%) 34.86 (13.4%) -0.0% ( -22% - 27%) 0.995 BrowseMonthSSDVFacets 3.14 (1.0%) 3.14 (1.1%) 0.0% ( -2% - 2%) 0.976 HighTermMonthSort 18.38 (13.3%) 18.42 (18.3%) 0.2% ( -27% - 36%) 0.969 BrowseDayOfYearSSDVFacets 2.89 (0.9%) 2.90 (1.1%) 0.2% ( -1% - 2%) 0.492 LowTerm 969.07 (4.7%) 971.45 (4.1%) 0.2% ( -8% - 9%) 0.860 HighSpanNear 3.27 (2.1%) 3.28 (1.8%) 0.3% ( -3% - 4%) 0.608 Respell 33.76 (1.2%) 33.94 (1.3%) 0.5% ( -1% - 3%) 0.185 PKLookup 123.25 (2.7%) 124.06 (3.2%) 0.7% ( -5% - 6%) 0.485 HighPhrase 218.15 (3.2%) 219.85 (3.0%) 0.8% ( -5% - 7%) 0.428 BrowseMonthTaxoFacets 1.39 (1.7%) 1.41 (1.7%) 0.9% ( -2% - 4%) 0.110 BrowseDateTaxoFacets 1.20 (2.2%) 1.22 (2.2%) 1.1% ( -3% - 5%) 0.114 BrowseDayOfYearTaxoFacets 1.20 (2.4%) 1.21 (2.4%) 1.2% ( -3% - 6%) 0.109 Fuzzy1 46.25 (8.0%) 47.43 (9.4%) 2.6% ( -13% - 21%) 0.354 {code} The modifications on PForUtil this was run with are [here|https://github.com/apache/lucene/compare/main...gsmiller:LUCENE-9850/pfordocids#diff-9f4cb4a664b2a8f0594b221368085548a58ecb1cc1290f18160b613d400fcc29]. I'll think about whether-or-not there's maybe further opportunities to optimize this. There's a lot of branching in there, but I'm not sure how much of it is avoidable. I'll put some fresh eyes on it tomorrow. > Explore PFOR for Doc ID delta encoding (instead of FOR) > ------------------------------------------------------- > > Key: LUCENE-9850 > URL: https://issues.apache.org/jira/browse/LUCENE-9850 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs > Affects Versions: main (9.0) > Reporter: Greg Miller > Priority: Minor > > It'd be interesting to explore using PFOR instead of FOR for doc ID encoding. > Right now PFOR is used for positions, frequencies and payloads, but FOR is > used for doc ID deltas. From a recent > [conversation|http://mail-archives.apache.org/mod_mbox/lucene-dev/202103.mbox/%3CCAPsWd%2BOp7d_GxNosB5r%3DQMPA-v0SteHWjXUmG3gwQot4gkubWw%40mail.gmail.com%3E] > on the dev mailing list, it sounds like this decision was made based on the > optimization possible when expanding the deltas. > I'd be interesting in measuring the index size reduction possible with > switching to PFOR compared to the performance reduction we might see by no > longer being able to apply the deltas in as optimal a way. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org