[ https://issues.apache.org/jira/browse/LUCENE-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314007#comment-17314007 ]
Greg Miller commented on LUCENE-9850: ------------------------------------- I'm getting in the weeds a bit on this probably, but since it appears JDK Mission Control now provides flame charts, we can now see where the PFOR approach is spending extra time (on top of FOR). Here's FOR as a baseline: !for.png|width=850,height=134! And here's my latest version with PFOR: !pfor.png|width=855,height=133! It looks like applying exceptions (PForUtil.applyExceptionsIn32Space) is accounting for a good chunk of where decoding is spending its time. This is pure extra overhead. And digging into that, it looks like ~2/3 of that time is being spent actually reading the bytes (position and high-order bits of the exceptions). !apply_exceptions.png|width=852,height=94! So I don't know. It feels like the algorithm piece of this is close to as optimized as it's going to get right now. Not sure how much can be done about reading those bytes. No way around the need to read them. > Explore PFOR for Doc ID delta encoding (instead of FOR) > ------------------------------------------------------- > > Key: LUCENE-9850 > URL: https://issues.apache.org/jira/browse/LUCENE-9850 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs > Affects Versions: main (9.0) > Reporter: Greg Miller > Priority: Minor > Attachments: apply_exceptions.png, for.png, pfor.png > > > It'd be interesting to explore using PFOR instead of FOR for doc ID encoding. > Right now PFOR is used for positions, frequencies and payloads, but FOR is > used for doc ID deltas. From a recent > [conversation|http://mail-archives.apache.org/mod_mbox/lucene-dev/202103.mbox/%3CCAPsWd%2BOp7d_GxNosB5r%3DQMPA-v0SteHWjXUmG3gwQot4gkubWw%40mail.gmail.com%3E] > on the dev mailing list, it sounds like this decision was made based on the > optimization possible when expanding the deltas. > I'd be interesting in measuring the index size reduction possible with > switching to PFOR compared to the performance reduction we might see by no > longer being able to apply the deltas in as optimal a way. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org