pmpailis commented on code in PR #13076:
URL: https://github.com/apache/lucene/pull/13076#discussion_r1479909430


##########
lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java:
##########
@@ -94,6 +95,29 @@ public float compare(float[] v1, float[] v2) {
     public float compare(byte[] v1, byte[] v2) {
       return scaleMaxInnerProductScore(dotProduct(v1, v2));
     }
+  },
+  /**
+   * Binary Hamming distance; Computes how many bits are different in two 
bytes.
+   *
+   * <p>Only supported for bytes. To convert the distance to a similarity 
score we normalize using 1
+   * / (1 + hammingDistance)
+   */
+  BINARY_HAMMING_DISTANCE {
+    @Override
+    public float compare(float[] v1, float[] v2) {
+      throw new UnsupportedOperationException(
+          BINARY_HAMMING_DISTANCE.name() + " is only supported for byte 
vectors");
+    }
+
+    @Override
+    public float compare(byte[] v1, byte[] v2) {
+      return (1f / (1 + binaryHammingDistance(v1, v2)));

Review Comment:
   I see your point. The initial idea was to have the score bounded in `(0, 1]` 
so to have more a "natural" way of interpreting it, i.e. 1 will always mean 
identical, and ~0 will mean that the two vectors are complements of each other 
(`1/(1+dim)`). If we are to scale the score based on the number of dimensions, 
we move this to `(0, dimensions*8]` which will effectively be the reverse of 
the distance. So for example if two vectors are identical, they would have a 
score of `dimensions * 8`, whereas if one is complement of the other, their 
score would be ~1 (`dim/(1+dim)` ). 
   
   Don't have a strong opinion on this, happy to proceed with updating the 
normalization constant if you prefer. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to