[ https://issues.apache.org/jira/browse/LUCENE-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313877#comment-17313877 ]
Greg Miller commented on LUCENE-9850: ------------------------------------- I haven't really been following along with what's going on in JDK17, but being able to more explicitly generate vectorized instructions will be nice! I optimized my branch a bit further. At this point I think it has all the optimizations of ForDeltaUtil, and the only extra work is applying the exceptions. I even pulled in the optimization to apply the prefix to two values at a time (packed in a long). That code is over [here|https://github.com/apache/lucene/compare/main...gsmiller:pfordocid-opto3]. (It's not polished at all... a bit hacky) The benchmark results are looking at lot better now, but maybe still some regressions. I've seen a little variability in these results, so I'm not sure how often they might present false-regression results on individual tasks? Here's what I've got at this point: {code:java} TaskQPS baseline StdDevQPS pfordocids StdDev Pct diff p-value LowSpanNear 98.61 (2.2%) 95.57 (1.7%) -3.1% ( -6% - 0%) 0.000 OrNotHighHigh 545.01 (3.8%) 531.11 (5.3%) -2.6% ( -11% - 6%) 0.078 Wildcard 40.83 (4.1%) 40.05 (3.9%) -1.9% ( -9% - 6%) 0.132 OrHighMed 102.39 (2.5%) 100.50 (2.5%) -1.8% ( -6% - 3%) 0.021 AndHighHigh 50.93 (3.3%) 50.03 (3.1%) -1.8% ( -7% - 4%) 0.079 TermDTSort 98.42 (11.6%) 96.72 (14.4%) -1.7% ( -24% - 27%) 0.676 AndHighMed 68.10 (2.9%) 66.94 (2.9%) -1.7% ( -7% - 4%) 0.063 HighTerm 1169.43 (4.4%) 1151.70 (5.1%) -1.5% ( -10% - 8%) 0.314 BrowseMonthSSDVFacets 12.50 (5.6%) 12.31 (7.7%) -1.5% ( -14% - 12%) 0.480 HighTermTitleBDVSort 157.42 (14.7%) 155.08 (15.6%) -1.5% ( -27% - 33%) 0.757 OrHighNotLow 545.83 (5.7%) 537.85 (7.0%) -1.5% ( -13% - 12%) 0.472 MedSpanNear 28.75 (2.4%) 28.34 (1.9%) -1.4% ( -5% - 2%) 0.038 OrHighNotHigh 533.41 (4.6%) 526.33 (5.3%) -1.3% ( -10% - 8%) 0.394 Fuzzy1 59.47 (6.0%) 58.72 (6.8%) -1.3% ( -13% - 12%) 0.533 HighSpanNear 21.27 (2.6%) 21.03 (2.2%) -1.1% ( -5% - 3%) 0.153 HighTermMonthSort 128.50 (12.2%) 127.11 (11.0%) -1.1% ( -21% - 25%) 0.769 OrNotHighLow 640.89 (4.0%) 634.43 (3.5%) -1.0% ( -8% - 6%) 0.395 OrHighHigh 21.11 (2.0%) 20.91 (1.8%) -1.0% ( -4% - 2%) 0.113 MedPhrase 103.90 (3.0%) 103.05 (2.9%) -0.8% ( -6% - 5%) 0.381 HighPhrase 172.59 (2.5%) 171.22 (2.5%) -0.8% ( -5% - 4%) 0.320 OrHighNotMed 535.67 (4.8%) 531.54 (4.7%) -0.8% ( -9% - 9%) 0.607 LowTerm 1094.41 (2.9%) 1087.97 (3.1%) -0.6% ( -6% - 5%) 0.535 MedSloppyPhrase 12.91 (2.4%) 12.85 (2.5%) -0.5% ( -5% - 4%) 0.542 IntNRQ 101.21 (0.5%) 100.81 (0.7%) -0.4% ( -1% - 0%) 0.040 PKLookup 144.62 (3.0%) 144.11 (3.1%) -0.4% ( -6% - 5%) 0.715 HighSloppyPhrase 3.75 (2.9%) 3.74 (3.0%) -0.3% ( -6% - 5%) 0.726 HighIntervalsOrdered 16.00 (2.1%) 15.95 (1.8%) -0.3% ( -4% - 3%) 0.597 HighTermDayOfYearSort 109.37 (11.0%) 109.03 (15.2%) -0.3% ( -23% - 29%) 0.941 LowSloppyPhrase 41.05 (1.9%) 40.93 (2.1%) -0.3% ( -4% - 3%) 0.635 MedTerm 1137.13 (4.1%) 1134.84 (4.2%) -0.2% ( -8% - 8%) 0.877 BrowseDayOfYearTaxoFacets 4.24 (3.4%) 4.23 (3.2%) -0.2% ( -6% - 6%) 0.885 Prefix3 263.31 (9.1%) 263.08 (9.2%) -0.1% ( -16% - 20%) 0.976 BrowseDateTaxoFacets 4.23 (3.4%) 4.23 (3.2%) -0.1% ( -6% - 6%) 0.941 Fuzzy2 44.14 (13.1%) 44.17 (14.2%) 0.1% ( -24% - 31%) 0.990 BrowseMonthTaxoFacets 5.00 (2.6%) 5.01 (2.2%) 0.1% ( -4% - 4%) 0.878 OrNotHighMed 508.22 (3.6%) 508.82 (3.9%) 0.1% ( -7% - 7%) 0.921 Respell 38.75 (2.3%) 38.82 (2.2%) 0.2% ( -4% - 4%) 0.791 BrowseDayOfYearSSDVFacets 11.38 (5.7%) 11.42 (5.8%) 0.3% ( -10% - 12%) 0.855 OrHighLow 220.12 (4.0%) 220.98 (3.7%) 0.4% ( -6% - 8%) 0.745 AndHighLow 620.94 (3.0%) 624.41 (3.3%) 0.6% ( -5% - 7%) 0.573 LowPhrase 132.77 (2.1%) 133.52 (2.4%) 0.6% ( -3% - 5%) 0.433 {code} > Explore PFOR for Doc ID delta encoding (instead of FOR) > ------------------------------------------------------- > > Key: LUCENE-9850 > URL: https://issues.apache.org/jira/browse/LUCENE-9850 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs > Affects Versions: main (9.0) > Reporter: Greg Miller > Priority: Minor > > It'd be interesting to explore using PFOR instead of FOR for doc ID encoding. > Right now PFOR is used for positions, frequencies and payloads, but FOR is > used for doc ID deltas. From a recent > [conversation|http://mail-archives.apache.org/mod_mbox/lucene-dev/202103.mbox/%3CCAPsWd%2BOp7d_GxNosB5r%3DQMPA-v0SteHWjXUmG3gwQot4gkubWw%40mail.gmail.com%3E] > on the dev mailing list, it sounds like this decision was made based on the > optimization possible when expanding the deltas. > I'd be interesting in measuring the index size reduction possible with > switching to PFOR compared to the performance reduction we might see by no > longer being able to apply the deltas in as optimal a way. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org