benwtrent opened a new pull request, #12479:
URL: https://github.com/apache/lucene/pull/12479

   The current dot-product score scaling and similarity implementation assumes 
normalized vectors. This disregards information that the model may store within 
the magnitude. 
   
   See: https://github.com/apache/lucene/issues/12342#issuecomment-1658640222 
for a good explanation for the need.
   
   To prevent from breaking current scoring assumptions in Lucene, a new 
`MAXIMUM_INNER_PRODUCT` similarity function is added. 
   
   Because the similarity from a `dotProduct` function call could be negative, 
this similarity scorer will scale negative dotProducts to between 0-1 and then 
all positive dotProduct values are from 1-MAX.
   
   One concern with adding this similarity function is that it breaks the 
triangle inequality. It is assumed that this is needed to build graph 
structures. But, there is conflicting research here when it comes to real-world 
data.
   
   See:
    - For: https://github.com/apache/lucene/issues/12342#issuecomment-1618258984
    - Against: 
https://github.com/apache/lucene/issues/12342#issuecomment-1631577657, 
https://github.com/apache/lucene/issues/12342#issuecomment-1631808301
   
   To check if any transformation of the input is required to satisfy the 
triangle inequality, many tests have been ran
   
   See:
   
    - https://github.com/apache/lucene/issues/12342#issuecomment-1653420640
    - https://github.com/apache/lucene/issues/12342#issuecomment-1656112434
    - https://github.com/apache/lucene/issues/12342#issuecomment-1656718447
   
   If there are any additional tests, or issues with the provided tests & 
scripts, please let me know. We want to make sure this works well for our users.
   
   closes: https://github.com/apache/lucene/issues/12342


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to