benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766983182
> I think of it as finding all results within a high-dimensional circle / sphere / equivalent, dot-product, cosine, etc. don't really follow that same idea as you point out. I would prefer something like `VectorSimilarityQuery` or something. > E.g. could we abort the approximate search if the list maintained by the RnnCollector grows too large, and fall back to an exact search that is based on a TwoPhaseIterator instead of eagerly collecting all matches into a list? I agree with @jpountz concerns. The topDocs collector gets a replay of the matched documents. We should put sane limits here and prevent folks from getting 100,000s of matches (int & float value arrays) via approximate search. It seems like having a huge number like that could cause issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org