mikemccand commented on code in PR #15341:
URL: https://github.com/apache/lucene/pull/15341#discussion_r2535254712


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsWriter.java:
##########
@@ -153,10 +153,19 @@ public long ramBytesUsed() {
     return total;
   }
 
+  private static long alignOutput(IndexOutput output, VectorEncoding encoding) 
throws IOException {
+    return output.alignFilePointer(
+        switch (encoding) {
+          case BYTE -> Float.BYTES;
+          case FLOAT32 -> 64; // optimal alignment for Arm Neoverse machines.
+        });
+  }
+
   private void writeField(FieldWriter<?> fieldData, int maxDoc) throws 
IOException {
     // write vector values
-    long vectorDataOffset = vectorData.alignFilePointer(Float.BYTES);
-    switch (fieldData.fieldInfo.getVectorEncoding()) {
+    VectorEncoding encoding = fieldData.fieldInfo.getVectorEncoding();
+    long vectorDataOffset = alignOutput(vectorData, encoding);

Review Comment:
   Hmm should we not bother aligning if the vectors themselves won't stay 
aligned?  I.e. the vector dimensions is not 0 mod 16 or so?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsWriter.java:
##########
@@ -153,10 +153,19 @@ public long ramBytesUsed() {
     return total;
   }
 
+  private static long alignOutput(IndexOutput output, VectorEncoding encoding) 
throws IOException {
+    return output.alignFilePointer(

Review Comment:
   Oh how nice that we already had a method to `alignFilePointer` -- I wonder 
who uses that today?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsWriter.java:
##########
@@ -153,10 +153,19 @@ public long ramBytesUsed() {
     return total;
   }
 
+  private static long alignOutput(IndexOutput output, VectorEncoding encoding) 
throws IOException {

Review Comment:
   Can we add a `NOTE` somewhere in javadocs (maybe in the `...Format.java`?) 
that this format tries to preserve alignment to 64 byte boundary because on at 
least ARM Neoverse this matters, or so?  And that means ideally to maximize 
performance the incoming vectors (if they are `float[]`) should have 0 mod 16 
dimensionality?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to