wangyanbn opened a new issue, #12256: URL: https://github.com/apache/lucene/issues/12256
### Description in [VectorUtil.dotProductScore](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/VectorUtil.java#L268), the `denom` should not multiply the array length: ` float denom = (float) (a.length * (1 << 15));` ` return 0.5f + dotProduct(a, b) / denom;` When we use float vector with the dot product similarity, we will normalize it to unit length 1, the float dot product score formula is: `0.5 + dotProduct(a_float_vec, b_float_vec) / 2`, it is not related to the array length. When we use byte, we just multiply every item in float vector by 128, and the byte dot product value will be multified by 128*128: `dotProduct(a_byte_vec, b_byte_vec) = dotProduct(128*a_float_vec, 128*b_float_vec) = 128*128*dotProduct(a_float_vec, b_float_vec)`, which is not related to array length. To get the same score as float vector, the byte score should be: `0.5 + dotProduct(a_byte_vec, b_byte_vec) /(128*128* 2)`, which is same as : `0.5 + dotProduct(a_byte_vec, b_byte_vec) /(float)(1 << 15)`. So the right `denom` should be: `float denom = (float) (1 << 15)` I am testing with ElasticSearch 8.7, which uses Lucene 9.5 . My vector length is 256, when use type byte , the scores are all near the value of 0.5 (because 256 is not a small num, the `denom` value will be very big), which are very different from the float type scores. If `denom` doesn't multiply the array length, the byte and float scores are similar. ### Version and environment details Lucene 9.5 JVM:19.0.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org