[ https://issues.apache.org/jira/browse/LUCENE-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307210#comment-17307210 ]
Michael McCandless commented on LUCENE-9850: -------------------------------------------- Here's the result of {{bpv-tool-only}} on Lucene nightly benchmarks (EN wikipedia) index: {noformat} DOC ID BPV 0 **** [7.93 pct] (1466075 of 18484892) 1 * [0.00 pct] (165 of 18484892) 2 * [0.58 pct] (106653 of 18484892) 3 ** [2.17 pct] (400444 of 18484892) 4 ** [3.16 pct] (584748 of 18484892) 5 *** [4.51 pct] (833082 of 18484892) 6 *** [5.86 pct] (1082974 of 18484892) 7 ***** [8.45 pct] (1561144 of 18484892) 8 ***** [9.93 pct] (1835188 of 18484892) 9 ****** [10.66 pct] (1970466 of 18484892) 10 ***** [9.68 pct] (1788853 of 18484892) 11 ***** [8.62 pct] (1594306 of 18484892) 12 **** [7.62 pct] (1409009 of 18484892) 13 **** [6.23 pct] (1151456 of 18484892) 14 *** [4.72 pct] (872013 of 18484892) 15 ** [3.46 pct] (640401 of 18484892) 16 ** [2.52 pct] (466228 of 18484892) 17 * [1.73 pct] (320292 of 18484892) 18 * [1.19 pct] (220389 of 18484892) 19 * [0.62 pct] (114238 of 18484892) 20 * [0.21 pct] (38229 of 18484892) 21 * [0.09 pct] (16846 of 18484892) 22 * [0.05 pct] (9250 of 18484892) 23 * [0.01 pct] (2443 of 18484892) 24 [0.00 pct] (0 of 18484892) 25 [0.00 pct] (0 of 18484892) 26 [0.00 pct] (0 of 18484892) 27 [0.00 pct] (0 of 18484892) 28 [0.00 pct] (0 of 18484892) 29 [0.00 pct] (0 of 18484892) 30 [0.00 pct] (0 of 18484892) 31 [0.00 pct] (0 of 18484892) Total bytes used: 20912256 {noformat} Curious how many 0-bit cases there are! > Explore PFOR for Doc ID delta encoding (instead of FOR) > ------------------------------------------------------- > > Key: LUCENE-9850 > URL: https://issues.apache.org/jira/browse/LUCENE-9850 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs > Affects Versions: main (9.0) > Reporter: Greg Miller > Priority: Minor > > It'd be interesting to explore using PFOR instead of FOR for doc ID encoding. > Right now PFOR is used for positions, frequencies and payloads, but FOR is > used for doc ID deltas. From a recent > [conversation|http://mail-archives.apache.org/mod_mbox/lucene-dev/202103.mbox/%3CCAPsWd%2BOp7d_GxNosB5r%3DQMPA-v0SteHWjXUmG3gwQot4gkubWw%40mail.gmail.com%3E] > on the dev mailing list, it sounds like this decision was made based on the > optimization possible when expanding the deltas. > I'd be interesting in measuring the index size reduction possible with > switching to PFOR compared to the performance reduction we might see by no > longer being able to apply the deltas in as optimal a way. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org