[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #947: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

GitBox Mon, 18 Jul 2022 12:53:54 -0700


mayya-sharipova commented on code in PR #947:
URL: https://github.com/apache/lucene/pull/947#discussion_r923800118



##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +213,38 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  /**
+   * Dot product score computed over signed bytes, scaled to be in [0, 1].
+   *
+   * @param a bytes containing a vector
+   * @param aOffset offset of the vector in a
+   * @param b bytes containing another vector, of the same dimension
+   * @param len the length (aka dimension) of the vectors
+   * @param bOffset offset of the vector in b
+   * @return the value of the similarity function applied to the two vectors
+   */
+  public static float dotProductScore(BytesRef a, int aOffset, BytesRef b, int 
bOffset, int len) {
+    int total = 0;
+    for (int i = 0; i < len; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];
+    }
+    // divide by 2 * 2^14 (maximum absolute value of product of 2 signed 
bytes) * len
+    return (1 + total) / (float) (len * (1 << 15));

Review Comment:
   To make scores non-negative should we do instead:
   `total / (float) (len * (1 << 15))  + 1`?
   
   I am also wondering why in the comments we say `// divide by 2 * 2^14`, but 
in the calculation we use `1 << 15`? Should it be `1 << 14` instead?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #947: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

Reply via email to