Re: [PR] Add a Multi-Vector Similarity Function [lucene]

via GitHub Tue, 12 Nov 2024 16:32:43 -0800


vigyasharma commented on PR #13991:
URL: https://github.com/apache/lucene/pull/13991#issuecomment-2472019578


   I am thinking we can leverage the `NONE` aggregation (in #13525) for 
non-ColBERT passage vector use-cases, by making each graph node correspond to a 
single value in the multi-vector i.e. index time aggregation becomes "none". 
The resultant graph would be similar to what we construct with parent-child 
docs today, while flat storage with multi-vectors could allow for aggregated 
similarity checks at *query* time. This could help with recall while making 
mutli-vector usage easier to use (no overquery or index time joins).
   
   It'll need some work: a mechanism to address each vector value directly, and 
corresponding changes in VectorValues. I'm thinking maybe an "ordinal" for the 
multi-vector, and a "sub-ordinal" for values within the multi-vector. Both ints 
can be packed into a long for node value?
   
   Since I haven't chalked out all the details yet, I decided to **remove** the 
`NONE` aggregation for now, and focus first on the ColBERT use case. Will go 
with the `isMultiVector` flag in FieldInfos to identify multi-vector storage 
requirements.
   
   cc: @cpoerschke , @benwtrent 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add a Multi-Vector Similarity Function [lucene]

Reply via email to