[ 
https://issues.apache.org/jira/browse/LUCENE-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314007#comment-17314007
 ] 

Greg Miller commented on LUCENE-9850:
-------------------------------------

I'm getting in the weeds a bit on this probably, but since it appears JDK 
Mission Control now provides flame charts, we can now see where the PFOR 
approach is spending extra time (on top of FOR). 

Here's FOR as a baseline:

!for.png|width=850,height=134!

And here's my latest version with PFOR:

!pfor.png|width=855,height=133!

It looks like applying exceptions (PForUtil.applyExceptionsIn32Space) is 
accounting for a good chunk of where decoding is spending its time. This is 
pure extra overhead. 

And digging into that, it looks like ~2/3 of that time is being spent actually 
reading the bytes (position and high-order bits of the exceptions).

!apply_exceptions.png|width=852,height=94!

So I don't know. It feels like the algorithm piece of this is close to as 
optimized as it's going to get right now. Not sure how much can be done about 
reading those bytes. No way around the need to read them.

> Explore PFOR for Doc ID delta encoding (instead of FOR)
> -------------------------------------------------------
>
>                 Key: LUCENE-9850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9850
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/codecs
>    Affects Versions: main (9.0)
>            Reporter: Greg Miller
>            Priority: Minor
>         Attachments: apply_exceptions.png, for.png, pfor.png
>
>
> It'd be interesting to explore using PFOR instead of FOR for doc ID encoding. 
> Right now PFOR is used for positions, frequencies and payloads, but FOR is 
> used for doc ID deltas. From a recent 
> [conversation|http://mail-archives.apache.org/mod_mbox/lucene-dev/202103.mbox/%3CCAPsWd%2BOp7d_GxNosB5r%3DQMPA-v0SteHWjXUmG3gwQot4gkubWw%40mail.gmail.com%3E]
>  on the dev mailing list, it sounds like this decision was made based on the 
> optimization possible when expanding the deltas.
> I'd be interesting in measuring the index size reduction possible with 
> switching to PFOR compared to the performance reduction we might see by no 
> longer being able to apply the deltas in as optimal a way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to