Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

via GitHub Wed, 18 Oct 2023 09:20:57 -0700


jmazanec15 commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362956122



##########
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##########
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Random;
+import java.util.stream.IntStream;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.VectorSimilarityFunction;
+
+/**
+ * Will scalar quantize float vectors into `int8` byte values. This is a lossy 
transformation.
+ * Scalar quantization works by first calculating the quantiles of the float 
vector values. The
+ * quantiles are calculated using the configured quantile/confidence interval. 
The [minQuantile,
+ * maxQuantile] are then used to scale the values into the range [0, 127] and 
bucketed into the
+ * nearest byte values.
+ *
+ * <h2>How Scalar Quantization Works</h2>
+ *
+ * <p>The basic mathematical equations behind this are fairly straight 
forward. Given a float vector
+ * `v` and a quantile `q` we can calculate the quantiles of the vector values 
[minQuantile,
+ * maxQuantile].
+ *
+ * <pre class="prettyprint">
+ *   byte = (float - minQuantile) * 127/(maxQuantile - minQuantile)
+ *   float = (maxQuantile - minQuantile)/127 * byte + minQuantile
+ * </pre>
+ *
+ * <p>This then means to multiply two float values together (e.g. dot_product) 
we can do the
+ * following:
+ *
+ * <pre class="prettyprint">
+ *   float1 * float2 ~= (byte1 * (maxQuantile - minQuantile)/127 + 
minQuantile) * (byte2 * (maxQuantile - minQuantile)/127 + minQuantile)
+ *   float1 * float2 ~= (byte1 * byte2 * (maxQuantile - 
minQuantile)^2)/(127^2) + (byte1 * minQuantile * (maxQuantile - 
minQuantile)/127) + (byte2 * minQuantile * (maxQuantile - minQuantile)/127) + 
minQuantile^2
+ *   let alpha = (maxQuantile - minQuantile)/127
+ *   float1 * float2 ~= (byte1 * byte2 * alpha^2) + (byte1 * minQuantile * 
alpha) + (byte2 * minQuantile * alpha) + minQuantile^2
+ * </pre>
+ *
+ * <p>The expansion for square distance is much simpler:
+ *
+ * <pre class="prettyprint">
+ *  square_distance = (float1 - float2)^2
+ *  (float1 - float2)^2 ~= (byte1 * alpha + minQuantile - byte2 * alpha - 
minQuantile)^2
+ *  = (alpha*byte1 + minQuantile)^2 + (alpha*byte2 + minQuantile)^2 - 
2*(alpha*byte1 + minQuantile)(alpha*byte2 + minQuantile)
+ *  this can be simplified to:
+ *  = alpha^2 (byte1 - byte2)^2
+ * </pre>
+ */
+public class ScalarQuantizer {
+
+  public static final int SCALAR_QUANTIZATION_SAMPLE_SIZE = 25_000;

Review Comment:
   Out of curiousity, why was 25K chose? Seems it will be about 12.5MB in 
memory per segment for 128-dimensional vectors which seems reasonable, but Im 
curious if it could be lower.



##########
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##########
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Random;
+import java.util.stream.IntStream;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.VectorSimilarityFunction;
+
+/**
+ * Will scalar quantize float vectors into `int8` byte values. This is a lossy 
transformation.
+ * Scalar quantization works by first calculating the quantiles of the float 
vector values. The
+ * quantiles are calculated using the configured quantile/confidence interval. 
The [minQuantile,
+ * maxQuantile] are then used to scale the values into the range [0, 127] and 
bucketed into the
+ * nearest byte values.
+ *
+ * <h2>How Scalar Quantization Works</h2>

Review Comment:
   I see. Looked at `getUpperAndLowerQuantile` closer and that makes sense.  



##########
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##########
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Random;
+import java.util.stream.IntStream;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.VectorSimilarityFunction;
+
+/** Will scalar quantize float vectors into `int8` byte values */
+public class ScalarQuantizer {
+
+  public static final int SCALAR_QUANTIZATION_SAMPLE_SIZE = 25_000;
+
+  private final float alpha;
+  private final float scale;
+  private final float minQuantile, maxQuantile, configuredQuantile;
+
+  /**
+   * @param minQuantile the lower quantile of the distribution
+   * @param maxQuantile the upper quantile of the distribution
+   * @param configuredQuantile The configured quantile/confidence interval 
used to calculate the
+   *     quantiles.
+   */
+  public ScalarQuantizer(float minQuantile, float maxQuantile, float 
configuredQuantile) {
+    assert maxQuantile >= maxQuantile;
+    this.minQuantile = minQuantile;
+    this.maxQuantile = maxQuantile;
+    this.scale = 127f / (maxQuantile - minQuantile);
+    this.alpha = (maxQuantile - minQuantile) / 127f;
+    this.configuredQuantile = configuredQuantile;
+  }
+
+  /**
+   * Quantize a float vector into a byte vector
+   *
+   * @param src the source vector
+   * @param dest the destination vector
+   * @param similarityFunction the similarity function used to calculate the 
quantile
+   * @return the corrective offset that needs to be applied to the score
+   */
+  public float quantize(float[] src, byte[] dest, VectorSimilarityFunction 
similarityFunction) {
+    assert src.length == dest.length;
+    float correctiveOffset = 0f;
+    for (int i = 0; i < src.length; i++) {
+      float v = src[i];
+      float dx = Math.max(minQuantile, Math.min(maxQuantile, src[i])) - 
minQuantile;
+      float dxs = scale * dx;
+      float dxq = Math.round(dxs) * alpha;
+      correctiveOffset += minQuantile * (v - minQuantile / 2.0F) + (dx - dxq) 
* dxq;
+      dest[i] = (byte) Math.round(dxs);
+    }
+    if (similarityFunction.equals(VectorSimilarityFunction.EUCLIDEAN)) {
+      return 0;
+    }
+    return correctiveOffset;
+  }

Review Comment:
   I think I am a little bit confused around the corrective offset - digging 
into it a little bit more.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

Reply via email to