[
https://issues.apache.org/jira/browse/LUCENE-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425725#comment-17425725
]
ASF subversion and git services commented on LUCENE-10147:
----------------------------------------------------------
Commit 9b1fc0ecc85365b202955c4731458fce19c5ba28 in lucene's branch
refs/heads/main from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9b1fc0e ]
LUCENE-10147: ensure that KnnVectorQuery scores are positive (#361)
> KnnVectorQuery can produce negative scores
> ------------------------------------------
>
> Key: LUCENE-10147
> URL: https://issues.apache.org/jira/browse/LUCENE-10147
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Julie Tibshirani
> Priority: Blocker
> Time Spent: 50m
> Remaining Estimate: 0h
>
> The cosine similarity of two vectors falls in the range [-1, 1]. So currently
> with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe
> we should just adjust the scores in this case by adding 1, shifting them to
> the range [0, 2].
> As a side note, this made me notice that
> {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need
> to know to normalize all document and query vectors to unit length when using
> this similarity. Otherwise the output is unbounded and difficult to handle in
> scoring. Also dot product is not a true metric: for example, it doesn't obey
> the triangle inequality. So many ANN algorithms have trouble supporting it.
> As part of this issue, we could improve the documentation on
> {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is
> required.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]