[ https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295367#comment-17295367 ]
Robert Muir commented on LUCENE-9822: ------------------------------------- Looks good. The single byte assumption reminds me though, with such huge block-sizes, the patching may not even work very well without changing how the class works completely. Currently It allows 3 exceptions for blocks of 128 so that 3 large values don't blow compression up for the whole block. But if you are trying to do something like blocksize=512, seems like you would need to allow for more exceptions (e.g. 12 or something) for the patching to be effective for general purposes. Maybe worth checking literature as I don't know off the top of my head where these numbers (128, 3) etc came from. > Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil > -------------------------------------------------------------------------- > > Key: LUCENE-9822 > URL: https://issues.apache.org/jira/browse/LUCENE-9822 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Affects Versions: master (9.0) > Reporter: Greg Miller > Priority: Trivial > Attachments: LUCENE-9822.patch > > > PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when > generating "patch offsets". If this assumption doesn't hold, PForUtil will > silently encode incorrect positions. While the BLOCK_SIZE isn't particularly > configurable, it would be nice to assert this assumption early in PForUtil in > the even that the BLOCK_SIZE changes in some future codec version. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org