benwtrent commented on code in PR #11946:
URL: https://github.com/apache/lucene/pull/11946#discussion_r1048856364
##########
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##########
@@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) {
* @throws IllegalArgumentException if <code>k</code> is less than 1
*/
public KnnVectorQuery(String field, float[] target, int k, Query filter) {
+ this(field, target, k, Float.NEGATIVE_INFINITY, filter);
+ }
+
+ /**
+ * Find the <code>k</code> nearest documents to the target vector according
to the vectors in the
+ * given field. <code>target</code> vector.
+ *
+ * @param field a field that has been indexed as a {@link KnnVectorField}.
+ * @param target the target of the search
+ * @param k the number of documents to find (the upper bound)
+ * @param similarityThreshold the minimum acceptable value of similarity
Review Comment:
@msokolov you haven't missed anything. I am specifically talking about users
providing `similarityThreshold` to the query. If they have calculating that
they want a specific `cosine` or `dotProduct` similarity, they would then need
to adjust that to match Lucene's scoring transformation.
I think that `similarityThreshold` should mean vector similarities. We can
transform it for the user to reflect the score that similarity represents
(given vector encoding type and similarity function).
An example here is `dotProduct`. The user knows they want `FLOAT32` vectors
within a dotProduct of 0.7. With this API that ACTUALLY means they want to
limit the scores to .85 (`(1 + dotProduct)/2`). How is the user supposed to
know that?
This seems really weird to me.
This doesn't take into account the different scoring methods between vector
types as well, which can get even more confusing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]