jtibshirani commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1301064813

   @rmuir thanks for taking a look! Here's some background ...
   
   This vector graph file holds the neighbors for each node (as integer 
ordinals). Each node has 32 neighbors max, so the file is organized into blocks 
of 32 ints. Since most graphs don't have `Integer.MAX_VALUE` nodes, we noticed 
we could save a lot of space by encoding these using `PackedInts.bitsRequired` 
instead of full integers.
   
   The access pattern during search:
   * Jump to a node's neighbors (by offsetting into the graph file) and read 
all of them. Process these.
   * Then jump to one of the neighbor node's neighbors, process those. This 
requires random access.
   * Repeat many times.
   
   I thought it make sense to use a `PackedInts.ReaderIterator` for each 
individual neighbor list. That way you just decode the whole list in one go 
(and we always process the whole neighbor list). I think it'd be less efficient 
to use `DirectReader` since it doesn't decode the whole list at once, instead 
it reads from the input for every value. Maybe I'm prematurely optimizing 
though... `DirectReader` looks much closer to what we want!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to