Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

via GitHub Mon, 28 Oct 2024 04:12:30 -0700


jimczi commented on PR #13525:
URL: https://github.com/apache/lucene/pull/13525#issuecomment-2441247130


   > it seems like single vector is a special form of multi-vector
   
   The solution really depends on the semantics. In its current form, the way 
multi-vectors are incorporated in this PR doesn’t quite extend the 
single-vector case. With max similarity, we assume that each similarity score 
results from a full comparison, which works well when the operations are 
limited (such as in re-ranking scenarios). However, for ColBERT, where the 
average number of vectors per document is large (in the hundreds or thousands), 
using HNSW with max similarity layered on top may not be the optimal approach. 
This is likely why other vector libraries don’t expose this setup.
   
   If our aim is to introduce max similarity in Lucene, we might need to 
explore a more effective strategy. Although nested vectors could be promising, 
they’re currently constrained by the 2B vector limit, which isn’t ideal for 
ColBERT, given that each input token is represented as a dense vector. The 
primary limitation with HNSW and the knn codec today seems to be this 2B cap on 
vectors.
   
   Given these factors, we may want to reconsider HNSW for this purpose. A 
scalable solution would likely involve running multiple queries (one per query 
vector) rather than relying on an aggregation strategy. Maybe the first goal 
should be to incorporate max sim for re-ranking use cases first using a flat 
format? 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

Reply via email to