vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2489342443
Thank you for sharing these use-cases @krickert ! 1. **Aggregate Scoring** – I think we can do this today by joining the child doc hits with their parents and calculating score on all children in the `ToParentBlockJoinQuery`. The `getAllVectorValues()` api should make this easier by avoiding the two phase query. We can also use the aggregate query scores during approximate search graph traversal itself (use aggregate query similarity with all vector values for the doc)? 2. **Chunk-Based Highlighting** – Interesting. With `getAllVectorValues()`, we can find all vector values with similarity above a separate sim-threshold for highlights? 3. **Custom Aggregation with Embedding Tags** – I think this one plays better with a separate child doc per vector value. We can store these tags and access related data as separate fields in child docs and filter on them during search. Honestly, I think the existing parent-block join can achieve most use-cases for independent multi-vectors (the passage vector use case). But the approach above might make it easier to use? We also need it for dependent multi-vectors like ColBERT, though it's a separate question on whether ANN is even viable for ColBERT (v/s only for reranking). I'd like to know what issues or limitations do people face with the existing parent-child support for multiple vector values, so we can address them here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org