msokolov commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1049804451
########## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ########## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @throws IllegalArgumentException if <code>k</code> is less than 1 */ public KnnVectorQuery(String field, float[] target, int k, Query filter) { + this(field, target, k, Float.NEGATIVE_INFINITY, filter); + } + + /** + * Find the <code>k</code> nearest documents to the target vector according to the vectors in the + * given field. <code>target</code> vector. + * + * @param field a field that has been indexed as a {@link KnnVectorField}. + * @param target the target of the search + * @param k the number of documents to find (the upper bound) + * @param similarityThreshold the minimum acceptable value of similarity Review Comment: OK, with the current CR, orthogonal vectors will have a DOT_PRODUCT "score" of 0.5, which could be surprising. However, this is similar to how result scores are treated elsewhere in Lucene - their value ranges are not well-defined; the only guarantee is that higher scores are "more relevant". I guess practically speaking, as a user, I think I am going to have to do empirical work to know what threshold to use; these are not likely going to be motivated by some a priori knowledge of what a "good" dot-product is, and given that I'd like to just be able to work with some kind of abstracted score in a known range (0 = worst, 1 = best).Conversely, if we were to switch to using vector similarities that would correspond more directly to the underlying functions, we would have to clearly define them (today we don't actually explain this anywhere, I guess we'd need to document) and maybe provide methods for computing them. Also they would be weird too, just in a different way. For exam ple, how would we explain 8-bit dot-product? Would it be the 8-bit dot-product score normalized by 2^15? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org