navneet1v commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2951768351
> What if we created a vector distance function like `dot-product(v1, v2) * sgnum(d1, d2)` where `(v1, d1)` and `(v2, d2)` are the `(vector, cluster)` pairs indexed together in the same document. With this, you would get a single graph made up of many smaller disjoint components. Then when you search, you would need to seed with appropriate entry points by doing a search for some document(s) matching the cluster. We'd also need to disable the way we forcibly connect disconnected components today. Possibly you could even encode the cluster as a subspace in the vector itself and avoid the need for another field that way. To do this, we'd have to find a way to extend the vector distance functions , or make it extensible. This is challenging, but at least it's something others have asked for and seems like a more natural kind of extension to me. @msokolov sorry for being late on the reply. If we are able to encode the cluster information(in this case tenant information) with vector say (vector, cluster) I think it will solve the problem on information representation part so that is pretty good. Now on how to use the cluster information/extra information in distance computation is more vector like way to represent things may be this could be a default implementation. But I would take it bit further where we open the extension point where Custom Codecs get the opportunity to influence further how this cluster logic can be used during node connection. Example can be nodes(aka Docs) of cluster 1 and cluster 2 can be connected but nodes of cluster 2 and 3 cannot be connected. Because with distances even though you can keep things as far as possible but they may popup in final search results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org