uschindler commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1712624281
########## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ########## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.io.IOException; +import java.lang.foreign.MemorySegment; +import java.nio.ByteOrder; +import jdk.incubator.vector.LongVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.store.IndexInput; + +final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil { + + private static final VectorSpecies<Long> LONG_SPECIES = + PanamaVectorConstants.PRERERRED_LONG_SPECIES; + + private final IndexInput in; + private final MemorySegment memorySegment; + + MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) { + this.in = in; + this.memorySegment = memorySegment; + } + + @Override + public void readLongs(int count, long[] b) throws IOException { + if (count < LONG_SPECIES.length()) { + in.readLongs(b, 0, count); + } else { + long offset = in.getFilePointer(); + long endOffset = offset + count * Long.BYTES; + int loopBound = LONG_SPECIES.loopBound(count - 1); + for (int i = 0; + i < loopBound; + i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * Long.BYTES) { + LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, ByteOrder.LITTLE_ENDIAN) Review Comment: Hi, just a guess: `copyMemory()` has some overhead for very short arrays. The good old ByteBuffers (MappedByteBuffer) had some optimization on ByteBuffer.getLongs() and other methods to copy a few items with a for-loop instead of copyMemory, see https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/java.base/share/classes/java/nio/X-Buffer.java.template#L954 With MemorySegment theres no such optimization anymore. We discussed this in early times of MemorySegment usage in Lucene, and Maurizio (from OpenJDK) at that time suggested to try out to use a simple loop instead of copyMemory for very small arays `<= Bits.JNI_COPY_TO_ARRAY_THRESHOLD`. The value is 6 array entries: https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/java.base/share/classes/java/nio/Bits.java#L232-L236 (I did this, but with Mike's benchmarks I did not see a large difference, only < stddev). It could be that the above code is using a loop and not copyMemory behind scenes and therefor has a small speedup for long arrays < 6 items. We could, separately from that issue implement some "short array scalar optimization" in `MemorySegmentIndexInput#getBytes/getLongs/getFloats/....` like in `XBuffer` classes. I'd revert this additional complexity here and better check for optimizing `MemorySegmentIndexInput`. When doing this is would speedup other usages in Lucene, too! It is quite easy to add a simple if/else here: https://github.com/apache/lucene/blob/5aa1aa98ea2cdb37d7d8d48657703a5eb7fbaefa/lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java#L222-L233 Basically as quick try add a (`if (length <= 6) { super.readLongs(....); return; }` here and measure again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org