benwtrent commented on issue #12313: URL: https://github.com/apache/lucene/issues/12313#issuecomment-2107660631
@vigyasharma @krickert There are a couple of ways to implement this natively in Lucene. 1. Have each individual vector be a connection in the graph with some resolution back to the original doc. One concern I have with this is that vector ordinals will now have to be stored as `long` values, effectively doubling heap requirements and increasing off-heap storage 2. Have documents be the vertices in the graph and consider only the max-sim other docs for the connections. My concern here is recall. It could be that the graph is too sparse, as now passages don't actually have `maxConn` connections. 3. Have documents be vertices, but ensure that there are connections relevant for each passage (would require adjustments to the HNSW builder). My concern here would be the complexity in the graph builder. It doesn't seem insurmountable, but this would alleviate the recall concern in point 2. All this also is predicated on a nice iterator API. I would expect us to have to update Float|ByteVectorValues to have a new "isMultiVector" or something and adjust the iterators so that you can iterate by doc, and then iterate all the vectors in a doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org