benwtrent commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1592990533
> If we did it there we wouldn't have to change the output of VectorSimilarity. However it's messy to do it there since this is specific to a particular similarity implementation, so on balance doing it in the similarity makes more sense to me. I am not sure why we care about separating VectorSimilarity and scoring. VectorSimilarity is only ever for KNN search and indexing and as long as vectors that are less similar score lower, its fine. If we start thinking about separating out scoring and similarity, we should do it for all the current similarities. This would be significant work and it would be tricky. Think of EUCLIDEAN, we invert it's calculation so that a higher score means more similar. So, we would still need to use `queryScore ` as the indexing similarity without significant changes to the underlying assumptions of the graph builder,etc. If folks want to use the raw vector distances, they should use `VectorUtil`. > I think the current range of dot products that are valid is [-1, 1] and scores map to [0, 1]. So I dont think we could map all negative values between [0, 0.5] I think you are correct @jmazanec15 since normalized vectors are in the unit-sphere. Its possible to have negative values (and thus fall into the [0, 0.5] range) when they point in opposite directions within the sphere. Your scaling method + a new `MAX_INNER_PRODUCT` similarity (which just uses `dotProduct` and scales it differently) covers the requirement of disallowing negative scores & non-normalized vectors. This may complicate things (which 'dotProduct' should I use?!?!?!), but we should not change the existing `VectorSimilarityFunction#DOT_PRODUCT`. Maybe we can deprecate `VectorSimilarityFunction#DOT_PRODUCT` usage for new fields in `9x` to encourage switching to `MAX_INNER_PRODUCT` and remove `VectorSimilarityFunction#DOT_PRODUCT` in `10`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org