navneet1v commented on issue #14247: URL: https://github.com/apache/lucene/issues/14247#issuecomment-2663976487
> I wonder if there might be a better way to accomplish your actual goal. Adding "extra data" doesn't seem like a good idea to me since it inherently blurs the function of the data format. Can you describe the intended use case more fully? I guess this is some kind of clustering? What does that mean? As mentioned in the description one use-case I was thinking was, lets say I have an identifier named tenantId for the vector and while doing the search I always want to get results for that specific tenant hence I want to build multiple HNSW graphs at a segment level 1 per tenant. So that when search comes in for a specific tenant, rather than searching over the large graph containing all the segments, search can traverse the single smaller graph for that specific tenant. ---------------------------------------------------- Alternative to this can be by using filters with k-NN, but in filters we need to run the query and collect all the filters docs before going in the HNSW search. This adds extra latency and also reduces the recall if the tenant is very small. @msokolov please let me know if you have more questions happy to ans that. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org