vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2444980147
Hi @jimczi , The main change in this PR is support for multi-vectors in flat readers and writers, along with a similarity spec for multiple vector values. It is possible that HNSW is not the ideal data structure to expose multi-vector ANN. We don't really change much in hnsw impl, except using multi-vector similarity for comparisons (graph build and search). Users can use the `PerFieldKnnVectorsFormat` to wire different data structures on top of the flat multi-vector format. We can also provide something off the box in a subsequent change. I think the aggregation fn. interface is also flexible for different types of similarity implementations? Notably, this change maps all vector values for a document to a single ordinal. This gets us past the 2B vector limit (which I like), but also reads all vector values for the document whenever fetched. I can't think of a case where we'd only like partial values, but if we do, perhaps we can handle it in the similarity/aggregate functions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org