[GitHub] [lucene] msokolov commented on a diff in pull request #899: Lucene 10577

GitBox Thu, 19 May 2022 06:53:42 -0700


msokolov commented on code in PR #899:
URL: https://github.com/apache/lucene/pull/899#discussion_r877085508



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene92/ExpandingRandomAccessVectorValues.java:
##########
@@ -0,0 +1,57 @@
+package org.apache.lucene.codecs.lucene92;
+
+import org.apache.lucene.index.RandomAccessVectorValues;
+import org.apache.lucene.index.RandomAccessVectorValuesProducer;
+import org.apache.lucene.util.BytesRef;
+
+import java.io.IOException;
+
+public class ExpandingRandomAccessVectorValues implements 
RandomAccessVectorValuesProducer {
+
+  private final RandomAccessVectorValuesProducer delegate;
+  private final float scale;
+
+  /**
+   * Wraps an existing vector values producer. Floating point vector values 
will be produced by scaling
+   * byte-quantized values read from the values produced by the input.
+   */
+  protected ExpandingRandomAccessVectorValues(RandomAccessVectorValuesProducer 
in, float scale) {
+    this.delegate = in;
+    assert scale != 0;
+    this.scale = scale;
+  }
+
+  @Override
+  public RandomAccessVectorValues randomAccess() throws IOException {
+    RandomAccessVectorValues delegateValues = delegate.randomAccess();
+    float[] value  = new float[delegateValues.dimension()];;
+
+    return new RandomAccessVectorValues() {
+
+      @Override
+      public int size() {
+        return delegateValues.size();
+      }
+
+      @Override
+      public int dimension() {
+        return delegateValues.dimension();
+      }
+
+      @Override
+      public float[] vectorValue(int targetOrd) throws IOException {
+        BytesRef binaryValue = delegateValues.binaryValue(targetOrd);
+        byte[] bytes = binaryValue.bytes;
+        for (int i = 0, j = binaryValue.offset; i < value.length; i++, j++) {
+          value[i] = bytes[j] * scale;

Review Comment:
   Well, there are definitely byte-oriented vectors. I don't think we should 
try to use some kind of 8-bit floating point (what would that do, have an 
exponent and a mantissa?) rather if we can scale to use the entire 8 bits as 
significant bits (no exponent) then we maximize the precision and can use the 
native instruction set, which ByteVector does seem to be accessing (based on 
some preliminary JMH I ran it seems quite a bit faster than 32-bit FloatVector).
   
   For now I see two directions: (1) figure out how to scale vectors and store 
as signed bytes at full precision (this issue), and (2) work up performant 
Vector API implementations of that scaling and the dot product in anticipation 
of the vector API hatching along the lines of the patch you already did, 
@rcmuir. Maybe we can push that to a branch that builds with `--add-modules 
jdk.incubator.vector` for posterity and ease of testing. I don't think we need 
to block (1) for (2) since it seems clear there will be a path to 
vectorization. In the meantime we can implement a byte-at-a-time dot product to 
replace the float-at-a-time dot product?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on a diff in pull request #899: Lucene 10577

Reply via email to