Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

via GitHub Wed, 20 Nov 2024 11:05:48 -0800


vigyasharma commented on PR #13525:
URL: https://github.com/apache/lucene/pull/13525#issuecomment-2489342443


   Thank you for sharing these use-cases @krickert !
   
   1. **Aggregate Scoring** – I think we can do this today by joining the child 
doc hits with their parents and calculating score on all children in the 
`ToParentBlockJoinQuery`. The `getAllVectorValues()` api should make this 
easier by avoiding the two phase query. We can also use the aggregate query 
scores during approximate search graph traversal itself (use aggregate query 
similarity with all vector values for the doc)?
   
   2. **Chunk-Based Highlighting** – Interesting. With `getAllVectorValues()`, 
we can find all vector values with similarity above a separate sim-threshold 
for highlights? 
   
   3. **Custom Aggregation with Embedding Tags** – I think this one plays 
better with a separate child doc per vector value. We can store these tags and 
access related data as separate fields in child docs and filter on them during 
search.
   
   Honestly, I think the existing parent-block join can achieve most use-cases 
for independent multi-vectors (the passage vector use case). But the approach 
above might make it easier to use? We also need it for dependent multi-vectors 
like ColBERT, though it's a separate question on whether ANN is even viable for 
ColBERT (v/s only for reranking).
   
   I'd like to know what issues or limitations do people face with the existing 
parent-child support for multiple vector values, so we can address them here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

Reply via email to