msokolov commented on issue #12579: URL: https://github.com/apache/lucene/issues/12579#issuecomment-1759925268
Interesting - so this is kind of like a noisy radius search in high dimensions? It makes sense to me intuitively since we don't generally expect searches to have the same number of results. If I search for `asd89HH8!@` I may get only very low-quality results, perhaps none that would exceed the threshold / fall in the radius boundary whereas if I search for something that shares terms in common with many documents, I would expect vector-based search to find them all. It's kind of weird from an IR perspective that we just pick some query-independent K. I wonder if there is some precedent for this in semantic search literature? I found one system that supports a threshold: https://typesense.org/docs/0.25.0/api/vector-search.html#distance-threshold but I don't think we do support that yet, do we, in KnnVector*Query? Perhaps we ought to lead with tha in order to introduce the utility of radius-searching, since you can simulate this with a threshold + large K, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org