msokolov commented on code in PR #11946:
URL: https://github.com/apache/lucene/pull/11946#discussion_r1049804451


##########
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##########
@@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) {
    * @throws IllegalArgumentException if <code>k</code> is less than 1
    */
   public KnnVectorQuery(String field, float[] target, int k, Query filter) {
+    this(field, target, k, Float.NEGATIVE_INFINITY, filter);
+  }
+
+  /**
+   * Find the <code>k</code> nearest documents to the target vector according 
to the vectors in the
+   * given field. <code>target</code> vector.
+   *
+   * @param field a field that has been indexed as a {@link KnnVectorField}.
+   * @param target the target of the search
+   * @param k the number of documents to find (the upper bound)
+   * @param similarityThreshold the minimum acceptable value of similarity

Review Comment:
   OK, with the current CR,   orthogonal vectors will have a DOT_PRODUCT  
"score" of 0.5, which could be surprising. However, this is similar to how 
result scores are treated elsewhere in Lucene - their value ranges are not 
well-defined; the only guarantee is that higher scores are "more relevant".  I 
guess practically speaking, as a user, I think I am going to have to do 
empirical work to know what threshold to use; these are not likely going to be 
motivated by some a priori knowledge of what a "good" dot-product is, and given 
that I'd like to just be able to work with some kind of abstracted score in a 
known range (0 = worst, 1 = best).Conversely, if we were to switch to using 
vector similarities that would correspond more directly to the underlying 
functions, we would have to clearly define them (today we don't actually 
explain this anywhere, I guess we'd need to document) and maybe provide methods 
for computing them. Also they would be weird too, just in a different way. For 
exam
 ple, how would we explain 8-bit dot-product? Would it be the 8-bit dot-product 
score normalized by 2^15? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to