Re: [PR] Implement off-heap quantized scoring [lucene]

via GitHub Tue, 05 Aug 2025 08:58:48 -0700


kaivalnp commented on code in PR #14863:
URL: https://github.com/apache/lucene/pull/14863#discussion_r2254730354



##########
lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -530,7 +566,41 @@ private int dotProductBody512Int4Packed(byte[] unpacked, 
byte[] packed, int limi
     return sum;
   }
 
-  private int dotProductBody256Int4Packed(byte[] unpacked, byte[] packed, int 
limit) {
+  private static int dotProductBody512Int4PackedPacked(
+      ByteVectorLoader a, ByteVectorLoader b, int limit) {
+    int sum = 0;
+    // iterate in chunks of 1024 items to ensure we don't overflow the short 
accumulator
+    for (int i = 0; i < limit; i += 4096) {
+      ShortVector acc0 = ShortVector.zero(ShortVector.SPECIES_512);
+      ShortVector acc1 = ShortVector.zero(ShortVector.SPECIES_512);
+      int innerLimit = Math.min(limit - i, 4096);
+      for (int j = 0; j < innerLimit; j += ByteVector.SPECIES_256.length()) {
+        // packed
+        var vb8 = b.load(ByteVector.SPECIES_256, i + j);
+        // packed
+        var va8 = a.load(ByteVector.SPECIES_256, i + j);
+
+        // upper
+        ByteVector prod8 = vb8.and((byte) 0x0F).mul(va8.and((byte) 0x0F));
+        Vector<Short> prod16 = prod8.convertShape(ZERO_EXTEND_B2S, 
ShortVector.SPECIES_512, 0);

Review Comment:
   > This is still doing a `convertShape` operation
   
   In this conversion, the bit sizes are the same (`ShortVector.SPECIES_256` -> 
`IntVector.SPECIES_256`) so it is not shape-changing, and should be cheap? I 
saw this in the Javadocs:
   
   ```
   If the old and new species have the same shape, the behavior is exactly the 
same as the simpler, shape-invariant method convert(). In such cases, the 
simpler method convert() should be used, to make code easier to reason about. 
Otherwise, this is a shape-changing operation, and may have special 
implementation costs
   ```
   
   Also, we're only doing the `convertShape` outside the inner loop (once per 
"chunk")
   
   > is there a way to extract the accumulated values from the short accs 
without converting to an int first?
   
   We could probably reinterpret it as integers, and do some bit manipulation. 
I changed the original accumulation logic from:
   
   ```java
         IntVector intAcc0 = acc0.convertShape(S2I, IntVector.SPECIES_256, 
0).reinterpretAsInts();
         IntVector intAcc1 = acc0.convertShape(S2I, IntVector.SPECIES_256, 
1).reinterpretAsInts();
         IntVector intAcc2 = acc1.convertShape(S2I, IntVector.SPECIES_256, 
0).reinterpretAsInts();
         IntVector intAcc3 = acc1.convertShape(S2I, IntVector.SPECIES_256, 
1).reinterpretAsInts();
         sum += intAcc0.add(intAcc1).add(intAcc2).add(intAcc3).reduceLanes(ADD);
   ```
   
   ..to:
   
   ```java
         IntVector acc0i = acc0.reinterpretAsInts();
         IntVector acc1i = acc1.reinterpretAsInts();
   
         IntVector intAcc0 = acc0i.and(0xFFFF); // retain lower short
         IntVector intAcc1 = acc0i.lanewise(LSHR, 16); // retain upper short
         IntVector intAcc2 = acc1i.and(0xFFFF);
         IntVector intAcc3 = acc1i.lanewise(LSHR, 16);
         sum += intAcc0.add(intAcc1).add(intAcc2).add(intAcc3).reduceLanes(ADD);
   ```
   
   ..and the results are:
   ```
   VectorUtilBenchmark.binaryHalfByteVectorPackedPacked    1024  thrpt   15  
12.941 ± 0.194  ops/us
   ```
   
   The performance is almost similar. I can use `convert` instead of 
`convertShape` for better readability (and enforcing that this operation is not 
shape-changing) as the Javadocs suggest



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Implement off-heap quantized scoring [lucene]

Reply via email to