benwtrent commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-2107660631

   @vigyasharma @krickert There are a couple of ways to implement this natively 
in Lucene. 
   
   1. Have each individual vector be a connection in the graph with some 
resolution back to the original doc. One concern I have with this is that 
vector ordinals will now have to be stored as `long` values, effectively 
doubling heap requirements and increasing off-heap storage
   2. Have documents be the vertices in the graph and consider only the max-sim 
other docs for the connections. My concern here is recall. It could be that the 
graph is too sparse, as now passages don't actually have `maxConn` connections.
   3. Have documents be vertices, but ensure that there are connections 
relevant for each passage (would require adjustments to the HNSW builder). My 
concern here would be the complexity in the graph builder. It doesn't seem 
insurmountable, but this would alleviate the recall concern in point 2.
   
   All this also is predicated on a nice iterator API. I would expect us to 
have to update Float|ByteVectorValues to have a new "isMultiVector" or 
something and adjust the iterators so that you can iterate by doc, and then 
iterate all the vectors in a doc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to