goankur commented on code in PR #13572:
URL: https://github.com/apache/lucene/pull/13572#discussion_r1817385236
##########
lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java:
##########
@@ -84,6 +91,76 @@ public void init() {
floatsA[i] = random.nextFloat();
floatsB[i] = random.nextFloat();
}
+ // Java 21+ specific initialization
+ final int runtimeVersion = Runtime.version().feature();
+ if (runtimeVersion >= 21) {
+ // Reflection based code to eliminate the use of Preview classes in JMH
benchmarks
+ try {
+ final Class<?> vectorUtilSupportClass =
VectorUtil.getVectorUtilSupportClass();
+ final var className =
"org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport";
+ if (vectorUtilSupportClass.getName().equals(className) == false) {
+ nativeBytesA = null;
+ nativeBytesB = null;
+ } else {
+ MethodHandles.Lookup lookup = MethodHandles.lookup();
+ final var MemorySegment = "java.lang.foreign.MemorySegment";
+ final var methodType =
+ MethodType.methodType(lookup.findClass(MemorySegment),
byte[].class);
+ MethodHandle nativeMemorySegment =
+ lookup.findStatic(vectorUtilSupportClass, "nativeMemorySegment",
methodType);
+ byte[] a = new byte[size];
Review Comment:
Yes this is the setup code for the benchmark. We run setup once every
`iteration` for a total of `15` iterations across `3` forks (5 iterations per
fork) for each `size` being tested. Each fork is preceded by 3 warm-up
iterations.
So before **each** iteration we generate random numbers in range [0-127] in
two on-heap `byte[]`, allocate off-heap memory segments and populate them with
contents from `byte[]`. These off-heap memory segments are provided to
`VectorUtil.NATIVE_DOT_PRODUCT` method handle.
(Code snippet below for reference)
```
@Param({"1", "128", "207", "256", "300", "512", "702", "1024"})
int size;
@Setup(Level.Iteration)
public void init() {
...
}
```
> I wonder if we would see something different if we generated a large
number of vectors and randomized which ones we compare on each run. Also would
performance vary if the vectors are sequential in their buffer (ie vector 0
starts at 0, vector 1 starts at size...)
I guess the question you are hinting at is how does the performance vary
when the two candidate vectors are further apart in memory (L1 cache / L2 cache
/ L3 cache / Main-memory). Do the gains from native implementation become
insignificant with increasing distance ? Its an interesting question and I
propose that we add benchmark method(s) to answer them in a follow up PR. Does
that sound reasonable ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]