Julie Tibshirani created LUCENE-10147: -----------------------------------------
Summary: KnnVectorQuery can produce negative scores Key: LUCENE-10147 URL: https://issues.apache.org/jira/browse/LUCENE-10147 Project: Lucene - Core Issue Type: Bug Reporter: Julie Tibshirani The cosine similarity of two vectors falls in the range [-1, 1]. So currently with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe we should just adjust the scores in this case by adding 1, shifting them to the range [0, 2]. As a side note, this made me notice that {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need to know to normalize all document and query vectors to unit length when using this similarity. Otherwise the output is unbounded and difficult to handle in scoring. Also dot product is not a true metric: for example, it doesn't obey the triangle inequality. So many ANN algorithms have trouble supporting it. As part of this issue, we could improve the documentation on {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is required. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org