[
https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295367#comment-17295367
]
Robert Muir commented on LUCENE-9822:
-------------------------------------
Looks good. The single byte assumption reminds me though, with such huge
block-sizes, the patching may not even work very well without changing how the
class works completely. Currently It allows 3 exceptions for blocks of 128 so
that 3 large values don't blow compression up for the whole block.
But if you are trying to do something like blocksize=512, seems like you would
need to allow for more exceptions (e.g. 12 or something) for the patching to be
effective for general purposes. Maybe worth checking literature as I don't know
off the top of my head where these numbers (128, 3) etc came from.
> Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
> --------------------------------------------------------------------------
>
> Key: LUCENE-9822
> URL: https://issues.apache.org/jira/browse/LUCENE-9822
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: master (9.0)
> Reporter: Greg Miller
> Priority: Trivial
> Attachments: LUCENE-9822.patch
>
>
> PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when
> generating "patch offsets". If this assumption doesn't hold, PForUtil will
> silently encode incorrect positions. While the BLOCK_SIZE isn't particularly
> configurable, it would be nice to assert this assumption early in PForUtil in
> the even that the BLOCK_SIZE changes in some future codec version.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]