benwtrent opened a new issue, #12700:
URL: https://github.com/apache/lucene/issues/12700

   ### Description
   
   VectorSimilarityFunction might return negative scores in extreme 
circumstances.
   
   This could happen if `VectorUtil#cosine` returns something like `-1.0000001` 
instead of just `-1` for antipodal vectors. Then the similarity score would be 
`-0.0000005` (this numbers are made up, and don't reflect a scenario I have 
actually seen).
   
   We already know that the floating point error compounds on larger vectors 
and using Panama. 
   
   Should we snap vector scores to `0` to ensure this doesn't happen? Or rely 
on users of the library to do such?
   
   Here is a related ES bug: 
https://github.com/elastic/elasticsearch/issues/100975
   
   NOTE: That bug is over 1536 dims, not the Lucene limit of 1024. However, it 
seems to me that this is a possibility even over 1024 dimensions.
   
   I am fine if the consensus is "library users just handle it". But it seems 
like something every user would be potentially concerned about.
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to