mayya-sharipova commented on code in PR #947: URL: https://github.com/apache/lucene/pull/947#discussion_r923800118
########## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ########## @@ -213,4 +213,38 @@ public static void add(float[] u, float[] v) { u[i] += v[i]; } } + + /** + * Dot product score computed over signed bytes, scaled to be in [0, 1]. + * + * @param a bytes containing a vector + * @param aOffset offset of the vector in a + * @param b bytes containing another vector, of the same dimension + * @param len the length (aka dimension) of the vectors + * @param bOffset offset of the vector in b + * @return the value of the similarity function applied to the two vectors + */ + public static float dotProductScore(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) { + int total = 0; + for (int i = 0; i < len; i++) { + total += a.bytes[aOffset++] * b.bytes[bOffset++]; + } + // divide by 2 * 2^14 (maximum absolute value of product of 2 signed bytes) * len + return (1 + total) / (float) (len * (1 << 15)); Review Comment: To make scores non-negative should we do instead: `total / (float) (len * (1 << 15)) + 1`? I am also wondering why in the comments we say `// divide by 2 * 2^14`, but in the calculation we use `1 << 15`? Should it be `1 << 14` instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org