Re: [PR] GroupVarInt Encoding Implementation for HNSW Graphs [lucene]

via GitHub Tue, 23 Dec 2025 12:12:04 -0800


jpountz commented on PR #14932:
URL: https://github.com/apache/lucene/pull/14932#issuecomment-3687854389


   This is plausible. Depending on the sizes of the gaps between consecutive 
node IDs in the graph, group varint may be a bit smaller or larger. For 
instance, if most of your gaps fall in the 9-14 range, they would go from 2 
bytes to 2.25 bytes, a 12.5% increase.
   
   Significant Bits | VInt (bytes) | Group VInt (Bytes)
   -- | -- | --
   1 – 7 bits | 1 | 1.25
   8 bits | 2 | 1.25
   9 – 14 bits | 2 | 2.25
   15 – 16 bits | 3 | 2.25
   17 – 21 bits | 3 | 3.25
   22 – 24 bits | 4 | 3.25
   25 – 28 bits | 4 | 4.25
   29 – 32 bits | 5 | 4.25
   
   However this should save lots of unpredictable branches. Vanilla VInt shines 
when the vast majority of integers are under 128 as the check on the 
continuation bit likely returns false. But when integers may be larger - such 
as in the HNSW graph, this check becomes hardly predictable and group VInt 
should decode significantly faster.
   
   > Any discussion on this would be appreciated!
   
   I'm curious what aspect you are interested in discussing specifically? Is 
this 10% increase making it challenging for Amazon Product Search to keep the 
graph in the page cache?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] GroupVarInt Encoding Implementation for HNSW Graphs [lucene]

Reply via email to