jpountz commented on PR #14932: URL: https://github.com/apache/lucene/pull/14932#issuecomment-3687854389
This is plausible. Depending on the sizes of the gaps between consecutive node IDs in the graph, group varint may be a bit smaller or larger. For instance, if most of your gaps fall in the 9-14 range, they would go from 2 bytes to 2.25 bytes, a 12.5% increase. Significant Bits | VInt (bytes) | Group VInt (Bytes) -- | -- | -- 1 – 7 bits | 1 | 1.25 8 bits | 2 | 1.25 9 – 14 bits | 2 | 2.25 15 – 16 bits | 3 | 2.25 17 – 21 bits | 3 | 3.25 22 – 24 bits | 4 | 3.25 25 – 28 bits | 4 | 4.25 29 – 32 bits | 5 | 4.25 However this should save lots of unpredictable branches. Vanilla VInt shines when the vast majority of integers are under 128 as the check on the continuation bit likely returns false. But when integers may be larger - such as in the HNSW graph, this check becomes hardly predictable and group VInt should decode significantly faster. > Any discussion on this would be appreciated! I'm curious what aspect you are interested in discussing specifically? Is this 10% increase making it challenging for Amazon Product Search to keep the graph in the page cache? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
