msokolov commented on issue #12579:
URL: https://github.com/apache/lucene/issues/12579#issuecomment-1759925268

   Interesting - so this is kind of like a noisy radius search in high 
dimensions? It makes sense to me intuitively since we don't generally expect 
searches to have the same number of results. If I search for `asd89HH8!@` I may 
get only very low-quality results, perhaps none that would exceed the threshold 
/ fall in the radius boundary whereas if I search for something that shares 
terms in common with many documents, I would expect vector-based search to find 
them all. It's kind of weird from an IR perspective that we just pick some 
query-independent K. I wonder if there is some precedent for this in semantic 
search literature?
   
   I found one system that supports a threshold: 
https://typesense.org/docs/0.25.0/api/vector-search.html#distance-threshold but 
I don't think we do support that yet, do we, in KnnVector*Query? Perhaps we 
ought to lead with tha in order to introduce the utility of radius-searching, 
since you can simulate this with a threshold + large K, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to