msokolov commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2665564697
What if we created a vector distance function like `dot-product(v1, v2) * sgnum(d1, d2)` where `(v1, d1)` and `(v2, d2)` are the `(vector, cluster)` pairs indexed together in the same document. With this, you would get a single graph made up of many smaller disjoint components. Then when you search, you would need to seed with appropriate entry points by doing a search for some document(s) matching the cluster. We'd also need to disable the way we forcibly connect disconnected components today. Possibly you could even encode the cluster as a subspace in the vector itself and avoid the need for another field that way. To do this, we'd have to find a way to extend the vector distance functions , or make it extensible. This is challenging, but at least it's something others have asked for and seems like a more natural kind of extension to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org