[ 
https://issues.apache.org/jira/browse/LUCENE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295367#comment-17295367
 ] 

Robert Muir commented on LUCENE-9822:
-------------------------------------

Looks good. The single byte assumption reminds me though, with such huge 
block-sizes, the patching may not even work very well without changing how the 
class works completely. Currently It allows 3 exceptions for blocks of 128 so 
that 3 large values don't blow compression up for the whole block. 

But if you are trying to do something like blocksize=512, seems like you would 
need to allow for more exceptions (e.g. 12 or something) for the patching to be 
effective for general purposes. Maybe worth checking literature as I don't know 
off the top of my head where these numbers (128, 3) etc came from. 

> Assert that ForUtil.BLOCK_SIZE can be encoded in a single byte in PForUtil
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-9822
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9822
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: master (9.0)
>            Reporter: Greg Miller
>            Priority: Trivial
>         Attachments: LUCENE-9822.patch
>
>
> PForUtil assumes that ForUtil.BLOCK_SIZE can be encoded in a single byte when 
> generating "patch offsets". If this assumption doesn't hold, PForUtil will 
> silently encode incorrect positions. While the BLOCK_SIZE isn't particularly 
> configurable, it would be nice to assert this assumption early in PForUtil in 
> the even that the BLOCK_SIZE changes in some future codec version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to