[GitHub] [lucene] searchivarius commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

via GitHub Mon, 07 Aug 2023 07:23:26 -0700


searchivarius commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1667954978


   Hi @benwtrent first question:
   
   do you have your test code publicly available?
   
   >we SHOULD be using euclidean as the similarity comparator. My tests used 
"dot-product" (angular).
   
   Second, great you remembered it, but I think there's no difference between 
cosine and L2 (i.e., search results are the same) if queries and documents have 
constant norms. They don't have to be normalized to the unit norm, I think any 
constant would suffice:
   
   L2=(a-b)^2 = |a|^2 - ab + |b|^2 = const1 - cosine_similarity(a,b) * const2
   
   What do you think? cc @jmazanec15 
   
   Additional experiments are always good. I have no immediate suggestion for 
realistic embeddings, though. 
   
   However, one can try to stress test the method by synthetically changing the 
data: Multiply or divide vector elements by a random uniform number in the 
range [1, M]. For large enough M, the transform might become beneficial.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] searchivarius commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

Reply via email to