[ https://issues.apache.org/jira/browse/LUCENE-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425286#comment-17425286 ]
Julie Tibshirani edited comment on LUCENE-10147 at 10/7/21, 12:10 AM: ---------------------------------------------------------------------- Good idea to update the docs when we add cosine similarity in LUCENE-10146. It sounds like you don't have objections, so I plan to move forward with that soon. was (Author: julietibs): Good idea to make the update the docs when we add cosine similarity in LUCENE-10146. It sounds like you don't have objections, so I plan to move forward with that soon. > KnnVectorQuery can produce negative scores > ------------------------------------------ > > Key: LUCENE-10147 > URL: https://issues.apache.org/jira/browse/LUCENE-10147 > Project: Lucene - Core > Issue Type: Bug > Reporter: Julie Tibshirani > Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > The cosine similarity of two vectors falls in the range [-1, 1]. So currently > with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe > we should just adjust the scores in this case by adding 1, shifting them to > the range [0, 2]. > As a side note, this made me notice that > {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need > to know to normalize all document and query vectors to unit length when using > this similarity. Otherwise the output is unbounded and difficult to handle in > scoring. Also dot product is not a true metric: for example, it doesn't obey > the triangle inequality. So many ANN algorithms have trouble supporting it. > As part of this issue, we could improve the documentation on > {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is > required. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org