wangyanbn opened a new issue, #12256:
URL: https://github.com/apache/lucene/issues/12256

   ### Description
   
   in 
[VectorUtil.dotProductScore](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/VectorUtil.java#L268),
 the `denom` should not multiply the array length:
   ` float denom = (float) (a.length * (1 << 15));`
   ` return 0.5f + dotProduct(a, b) / denom;`
   
   When we use float vector with the dot product similarity, we will normalize 
it to unit length 1, the float dot product score formula is: 
   `0.5 + dotProduct(a_float_vec, b_float_vec) / 2`, it is not related to the 
array length.
   
   When we use byte, we just multiply every item in float vector by 128, and 
the byte dot product value will be multified by 128*128: 
   `dotProduct(a_byte_vec, b_byte_vec)  = dotProduct(128*a_float_vec, 
128*b_float_vec)  = 128*128*dotProduct(a_float_vec, b_float_vec)`,  
   which is not related to array length. 
   
   To get the same score as float vector, the byte score should be:
    `0.5 + dotProduct(a_byte_vec, b_byte_vec) /(128*128* 2)`, 
   which is same as :
   `0.5 + dotProduct(a_byte_vec, b_byte_vec) /(float)(1 << 15)`.
   
   So the right `denom` should be: `float denom = (float) (1 << 15)`
   
   I am testing with ElasticSearch 8.7, which uses Lucene 9.5 . My vector 
length is 256, when use type byte , the scores are all near the value of 0.5 
(because 256 is not a small num, the `denom` value will be very big), which are 
very different from the float type scores. 
   If `denom` doesn't multiply the array length, the byte and float scores are 
similar.
   
   
   
   ### Version and environment details
   
   Lucene 9.5
   JVM:19.0.2
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to