Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

via GitHub Tue, 29 Oct 2024 10:58:29 -0700


vigyasharma commented on PR #13525:
URL: https://github.com/apache/lucene/pull/13525#issuecomment-2444980147


   Hi @jimczi , The main change in this PR is support for multi-vectors in flat 
readers and writers, along with a similarity spec for multiple vector values.
   
   It is possible that HNSW is not the ideal data structure to expose 
multi-vector ANN. We don't really change much in hnsw impl, except using 
multi-vector similarity for comparisons (graph build and search). Users can use 
the `PerFieldKnnVectorsFormat` to wire different data structures on top of the 
flat multi-vector format. We can also provide something off the box in a 
subsequent change. I think the aggregation fn. interface is also flexible for 
different types of similarity implementations?
   
   Notably, this change maps all vector values for a document to a single 
ordinal. This gets us past the 2B vector limit (which I like), but also reads 
all vector values for the document whenever fetched. I can't think of a case 
where we'd only like partial values, but if we do, perhaps we can handle it in 
the similarity/aggregate functions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

Reply via email to