nknize commented on code in PR #12688:
URL: https://github.com/apache/lucene/pull/12688#discussion_r1382429884


##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/bitpacking/BitPacker.java:
##########


Review Comment:
   > Since they are of the same size...
   
   That's the difference. In your use case the records (blocks) are guaranteed 
to be the same size where as in the serialized tree case the records (tree 
nodes) are not guaranteed to be the same size. This is by design to ensure the 
resulting docvalue disk consumption is as efficient (small) as possible. 
   
   
   
   
   
   > ...by a quick glance it seems to me it encodes values with variable length 
(VInt, VLong). Maybe the random-access is achieved in different ways?
   
   Yes to variable length encoding. The "random-ness" isn't purely random in 
that traversal of the serialized tree is DFS. Because the tree nodes are 
variable size the serialized array includes copious "book-keeping" in the form 
of "sizeOf" values. At DFS traversal the first "sizeOf" value provides the size 
of the entire left tree. To prune the left tree just means we skip that many 
bytes to get to the right tree.. this continues recursively. In practice we 
don't expect to ever "back up" in our DFS traversal so there is only a `rewind` 
method that simply resets the offset values to 0. 
   
   
   Seems the two use cases are subtly different but I could see roughly 80% 
overlap in the implementation. I'd love to efficiently encapsulate this logic 
for the next contributor that wants a random serialized traversal mechanism 
without a ridiculous amount of java object overhead. Sounds like 
@bruno-roustant had the same need? Could be a good follow on progress PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to