vigyasharma commented on PR #13525:
URL: https://github.com/apache/lucene/pull/13525#issuecomment-2466073115

   I tried to find some blogs and benchmarks on other library implementations. 
Astra Db, Vespa, faiss and nmslib, all seem to support multi-vectors in some 
form. 
   
   From what I can tell, Astra Db and Vespa have ColBERT style multi-vector 
support in ANN 
[[1]](https://docs.datastax.com/en/ragstack/examples/colbert.html) 
[[2]](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/). 
Benchmarks indicate ColBERT outperforms other techniques in quality, but full 
ColBERT on ANN has higher latency 
[[3]](https://thenewstack.io/overcoming-the-limits-of-rag-with-colbert/). For 
large scale applications, users seem to overquery on ANN with single vector 
representations, and rerank them with ColBERT token vectors 
[[4]](https://blog.vespa.ai/announcing-long-context-colbert-in-vespa/). 
However, there's also ongoing work/research on reducing the no. of embeddings 
in ColBERT, like PLAID which replaces a bunch of vectors with their centroids 
[[5]](https://arxiv.org/abs/2205.09707). 
   
   ...
   
   > I am honestly torn around whats the best path forward for the majority of 
users in Lucene.
   
   I hear you! And I don't want to add complexity only because we have some 
body of work in this PR. Thanks for raising the concern Jim, it led me to some 
interesting reading.
   
   ...
   
   My current thinking is that this is a rapidly evolving field, and it's early 
to lean one way or another. Adding this support unlocks experimentation. We 
might add different, scalable, ANN algos going forward, and our flat storage 
format should work with most of them. Meanwhile, there's research on different 
ways to run late interaction with multiple but fewer vectors. This change will 
help users experiment with what works at their scale, for their 
cost/performance/quality requirements.
   
   I'm happy to change my perspective, and would like to hear more opinions. 
One reason to not add this would be if it makes the single vector setup hard to 
evolve. I'd like to understand if (and how) this is happening, and think on how 
we can address those concerns.
   ...
   
   1: https://docs.datastax.com/en/ragstack/examples/colbert.html
   2: https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/
   3: https://thenewstack.io/overcoming-the-limits-of-rag-with-colbert/
   4: https://blog.vespa.ai/announcing-long-context-colbert-in-vespa/
   5: PLAID - https://arxiv.org/abs/2205.09707
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to