navneet1v commented on issue #14247:
URL: https://github.com/apache/lucene/issues/14247#issuecomment-2951768351

   > What if we created a vector distance function like `dot-product(v1, v2) * 
sgnum(d1, d2)` where `(v1, d1)` and `(v2, d2)` are the `(vector, cluster)` 
pairs indexed together in the same document. With this, you would get a single 
graph made up of many smaller disjoint components. Then when you search, you 
would need to seed with appropriate entry points by doing a search for some 
document(s) matching the cluster. We'd also need to disable the way we forcibly 
connect disconnected components today. Possibly you could even encode the 
cluster as a subspace in the vector itself and avoid the need for another field 
that way. To do this, we'd have to find a way to extend the vector distance 
functions , or make it extensible. This is challenging, but at least it's 
something others have asked for and seems like a more natural kind of extension 
to me.
   
   @msokolov sorry for being late on the reply. If we are able to encode the 
cluster information(in this case tenant information) with vector say (vector, 
cluster) I think it will solve the problem on information representation part 
so that is pretty good.
   
   Now on how to use the cluster information/extra information in distance 
computation is more vector like way to represent things may be this could be a 
default implementation. 
   
   But I would take it bit further where we open the extension point where 
Custom Codecs get the opportunity to influence further how this cluster logic 
can be used during node connection. Example can be nodes(aka Docs) of cluster 1 
and cluster 2 can be connected but nodes of cluster 2 and 3 cannot be 
connected. Because with distances even though you can keep things as far as 
possible but they may popup in final search results.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to