[ https://issues.apache.org/jira/browse/LUCENE-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424675#comment-17424675 ]
Michael Sokolov commented on LUCENE-10147: ------------------------------------------ I'm working up a small CR that should address this -- main question will be what to name the new function :) This was my first stab, but score seems too generic, although it does align with ScoreDoc.score. Maybe something like normalizeScore? {{ /** * Converts similarity scores used (may be negative, reversed, etc) into document scores, * which must be positive, with higher scores representing better matches. * @param similarity the raw internal score as returned by {@link #compare(float[], float[])}. * @return normalizedSimilarity */ public abstract float score(float similarity); }} > KnnVectorQuery can produce negative scores > ------------------------------------------ > > Key: LUCENE-10147 > URL: https://issues.apache.org/jira/browse/LUCENE-10147 > Project: Lucene - Core > Issue Type: Bug > Reporter: Julie Tibshirani > Priority: Blocker > > The cosine similarity of two vectors falls in the range [-1, 1]. So currently > with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe > we should just adjust the scores in this case by adding 1, shifting them to > the range [0, 2]. > As a side note, this made me notice that > {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need > to know to normalize all document and query vectors to unit length when using > this similarity. Otherwise the output is unbounded and difficult to handle in > scoring. Also dot product is not a true metric: for example, it doesn't obey > the triangle inequality. So many ANN algorithms have trouble supporting it. > As part of this issue, we could improve the documentation on > {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is > required. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org