benwtrent commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1592990533

   > If we did it there we wouldn't have to change the output of 
VectorSimilarity. However it's messy to do it there since this is specific to a 
particular similarity implementation, so on balance doing it in the similarity 
makes more sense to me.
   
   I am not sure why we care about separating VectorSimilarity and scoring. 
VectorSimilarity is only ever for KNN search and indexing and as long as 
vectors that are less similar score lower, its fine.
   
   If we start thinking about separating out scoring and similarity, we should 
do it for all the current similarities. This would be significant work and it 
would be tricky. Think of EUCLIDEAN, we invert it's calculation so that a 
higher score means more similar. So, we would still need to use `queryScore ` 
as the indexing similarity without significant changes to the underlying 
assumptions of the graph builder,etc.
   
   If folks want to use the raw vector distances, they should use `VectorUtil`.
   
   >  I think the current range of dot products that are valid is [-1, 1] and 
scores map to [0, 1]. So I dont think we could map all negative values between 
[0, 0.5]
   
   I think you are correct @jmazanec15 since normalized vectors are in the 
unit-sphere. Its possible to have negative values (and thus fall into the [0, 
0.5] range) when they point in opposite directions within the sphere. Your 
scaling method + a new `MAX_INNER_PRODUCT` similarity (which just uses 
`dotProduct` and scales it differently) covers the requirement of disallowing 
negative scores & non-normalized vectors.
   
   This may complicate things (which 'dotProduct' should I use?!?!?!), but we 
should not change the existing `VectorSimilarityFunction#DOT_PRODUCT`. Maybe we 
can deprecate `VectorSimilarityFunction#DOT_PRODUCT` usage for new fields in 
`9x` to encourage switching to `MAX_INNER_PRODUCT` and remove 
`VectorSimilarityFunction#DOT_PRODUCT` in `10`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to