uschindler commented on code in PR #13076:
URL: https://github.com/apache/lucene/pull/13076#discussion_r1479835140


##########
lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java:
##########
@@ -94,6 +95,29 @@ public float compare(float[] v1, float[] v2) {
     public float compare(byte[] v1, byte[] v2) {
       return scaleMaxInnerProductScore(dotProduct(v1, v2));
     }
+  },
+  /**
+   * Binary Hamming distance; Computes how many bits are different in two 
bytes.
+   *
+   * <p>Only supported for bytes. To convert the distance to a similarity 
score we normalize using 1
+   * / (1 + hammingDistance)
+   */
+  BINARY_HAMMING_DISTANCE {
+    @Override
+    public float compare(float[] v1, float[] v2) {
+      throw new UnsupportedOperationException(
+          BINARY_HAMMING_DISTANCE.name() + " is only supported for byte 
vectors");
+    }
+
+    @Override
+    public float compare(byte[] v1, byte[] v2) {
+      return (1f / (1 + binaryHammingDistance(v1, v2)));

Review Comment:
   This depends on vector length, is this intended? I would have expected to 
have something like `dimensions * 8 / (1 + distance)`. I know, it is not 
relevant for scoring purposes as it is a constant factor, but we have some 
normalization on other functions, too.



##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -214,4 +214,19 @@ public static float[] checkFinite(float[] v) {
     }
     return v;
   }
+
+  public static int binaryHammingDistance(byte[] a, byte[] b) {
+    int distance = 0, i = 0;
+    for (final int upperBound = a.length & ~(Long.BYTES - 1); i < upperBound; 
i += Long.BYTES) {
+      distance +=
+          Long.bitCount(
+              ((long) BitUtil.VH_NATIVE_LONG.get(a, i) ^ (long) 
BitUtil.VH_NATIVE_LONG.get(b, i))
+                  & 0xFFFFFFFFFFFFFFFFL);

Review Comment:
   remove the `& 0xFFFFFFFFFFFFFFFFL`, it's useless. See my previous comment 
with the "final version": 
https://github.com/apache/lucene/pull/13076#issuecomment-1928027541



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to