[ https://issues.apache.org/jira/browse/LUCENE-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424574#comment-17424574 ]
Julie Tibshirani commented on LUCENE-10147: ------------------------------------------- I like the idea of VectorSimilarityFunction "owning" how to produce valid scores, and keeping this separate from the main similarity computation to avoid overhead. I am a bit busy right now but will look into fixing this in a few days if no one beats me to it. > KnnVectorQuery can produce negative scores > ------------------------------------------ > > Key: LUCENE-10147 > URL: https://issues.apache.org/jira/browse/LUCENE-10147 > Project: Lucene - Core > Issue Type: Bug > Reporter: Julie Tibshirani > Priority: Major > > The cosine similarity of two vectors falls in the range [-1, 1]. So currently > with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe > we should just adjust the scores in this case by adding 1, shifting them to > the range [0, 2]. > As a side note, this made me notice that > {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need > to know to normalize all document and query vectors to unit length when using > this similarity. Otherwise the output is unbounded and difficult to handle in > scoring. Also dot product is not a true metric: for example, it doesn't obey > the triangle inequality. So many ANN algorithms have trouble supporting it. > As part of this issue, we could improve the documentation on > {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is > required. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org