vigyasharma opened a new pull request, #14729:
URL: https://github.com/apache/lucene/pull/14729

   Late Interaction models, like [ColBERT](https://arxiv.org/abs/2004.12832) 
and [ColPali](https://arxiv.org/html/2407.01449v2), capture rich semantic 
interaction between documents and queries, and have been shown to outperform 
single-vector (no-interaction) models on search relevance. These models operate 
by using multi-vector representations for query (and document) embeddings. 
   
   One challenge with including late interaction models in search, has been 
working with multi-vectors at scale. This change provides an efficient 
workaround, by adding support to rerank results of a query using late 
interaction multi-vectors.
   
   Typical envisioned use-case is to do the full corpus search using ANN search 
on single-valued vectors, followed by a second pass that reranks results using 
late-interaction multi-vector scores. This PR creates:
   1. A LateInteractionField that stores multi-vectors in BinaryDocValues
   2. A DoubleValuesSource to scores query and document multi-vectors.
   3. A FunctionScore query that wraps a provided query and reranks its result 
with late-interaction model scores.
   
   Note: This first approach does not add additional metadata to `FieldInfo`. 
As a result, we are unable to ensure consistency in shape for multi-vector 
indexed in the same field across documents.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to