jtibshirani commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1301064813
@rmuir thanks for taking a look! Here's some background ... This vector graph file holds the neighbors for each node (as integer ordinals). Each node has 32 neighbors max, so the file is organized into blocks of 32 ints. Since most graphs don't have `Integer.MAX_VALUE` nodes, we noticed we could save a lot of space by encoding these using `PackedInts.bitsRequired` instead of full integers. The access pattern during search: * Jump to a node's neighbors (by offsetting into the graph file) and read all of them. Process these. * Then jump to one of the neighbor node's neighbors, process those. This requires random access. * Repeat many times. I thought it make sense to use a `PackedInts.ReaderIterator` for each individual neighbor list. That way you just decode the whole list in one go (and we always process the whole neighbor list). I think it'd be less efficient to use `DirectReader` since it doesn't decode the whole list at once, instead it reads from the input for every value. Maybe I'm prematurely optimizing though... `DirectReader` looks much closer to what we want! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org