[GitHub] [lucene] uschindler commented on a change in pull request #18: LUCENE-9838: simd version of VectorUtil.dotProduct

GitBox Tue, 16 Mar 2021 01:30:03 -0700


uschindler commented on a change in pull request #18:
URL: https://github.com/apache/lucene/pull/18#discussion_r594952973




##########
File path: lucene/core/src/java/org/apache/lucene/util/VectorUtil.java
##########
@@ -17,16 +17,123 @@
 
 package org.apache.lucene.util;
 
+import java.lang.invoke.MethodHandle;
+import java.lang.invoke.MethodHandles;
+import java.lang.invoke.MethodType;
+import java.util.Base64;
+
 /** Utilities for computations with numeric arrays */
 public final class VectorUtil {
 
   private VectorUtil() {}
 
+  // org.apache.lucene.util.VectorUtilSIMD#dotProduct(float[], float[])
+  private static final String SIMD_BASE64 =
+      
"yv66vgAAADwAbQoAAgADBwAEDAAFAAYBABBqYXZhL2xhbmcvT2JqZWN0AQAGPGluaXQ+AQADKClW\n"

Review comment:
       > But yeah, it is true that maybe we can start working those other 
hotspots off as well. For example, IMO it is silly with mmap directory for us 
to be decoding byte[] slowly into a float[] (readLEFloats or whatever). Vector 
API can use byte[] or even ByteBuffer directly (I assume any conversions are 
vectorized too, have not experimented with that).
   
   It gets even worse with MMapDirectory version 2 for Java 16. So IMHO, once 
we are really on JDK 17 minimum, we should change the method signatures of 
IndexInpout and replace our `void readLEFloats(float[])` by `FloatVector 
readFloatVector()`, on MMapDirectory this can use a ByteBuffer oder 
MemorySegemnt directly on the mmapped contents. This would space millions of 
native->heap arraycopy actions for nonsense.

##########
File path: lucene/core/src/java/org/apache/lucene/util/VectorUtil.java
##########
@@ -17,16 +17,123 @@
 
 package org.apache.lucene.util;
 
+import java.lang.invoke.MethodHandle;
+import java.lang.invoke.MethodHandles;
+import java.lang.invoke.MethodType;
+import java.util.Base64;
+
 /** Utilities for computations with numeric arrays */
 public final class VectorUtil {
 
   private VectorUtil() {}
 
+  // org.apache.lucene.util.VectorUtilSIMD#dotProduct(float[], float[])
+  private static final String SIMD_BASE64 =
+      
"yv66vgAAADwAbQoAAgADBwAEDAAFAAYBABBqYXZhL2xhbmcvT2JqZWN0AQAGPGluaXQ+AQADKClW\n"

Review comment:
       > But yeah, it is true that maybe we can start working those other 
hotspots off as well. For example, IMO it is silly with mmap directory for us 
to be decoding byte[] slowly into a float[] (readLEFloats or whatever). Vector 
API can use byte[] or even ByteBuffer directly (I assume any conversions are 
vectorized too, have not experimented with that).
   
   It gets even worse with MMapDirectory version 2 for Java 16. So IMHO, once 
we are really on JDK 17 minimum, we should change the method signatures of 
IndexInpout and replace our `void readLEFloats(float[])` by `FloatVector 
readFloatVector()`, on MMapDirectory this can use a ByteBuffer oder 
MemorySegemnt directly on the mmapped contents. This would spare millions of 
native->heap arraycopy actions for nonsense.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on a change in pull request #18: LUCENE-9838: simd version of VectorUtil.dotProduct

Reply via email to