navneet1v commented on issue #14247:
URL: https://github.com/apache/lucene/issues/14247#issuecomment-2663976487

   > I wonder if there might be a better way to accomplish your actual goal. 
Adding "extra data" doesn't seem like a good idea to me since it inherently 
blurs the function of the data format. Can you describe the intended use case 
more fully? I guess this is some kind of clustering? What does that mean?
   
   As mentioned in the description one use-case I was thinking was, lets say I 
have an identifier named tenantId for the vector and while doing the search I 
always want to get results for that specific tenant hence I want to build 
multiple HNSW graphs at a segment level 1 per tenant. So that when search comes 
in for a specific tenant, rather than searching over the large graph containing 
all the segments, search can traverse the single smaller graph for that 
specific tenant.
   
   ----------------------------------------------------
   Alternative to this can be by using filters with k-NN, but in filters we 
need to run the query and collect all the filters docs before going in the HNSW 
search. This adds extra latency and also reduces the recall if the tenant is 
very small.
   
   @msokolov please let me know if you have more questions happy to ans that. 
:) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to