benwtrent commented on code in PR #16050:
URL: https://github.com/apache/lucene/pull/16050#discussion_r3235962668


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java:
##########
@@ -391,6 +391,25 @@ public void close() throws IOException {
   private record DocValuesSkipperEntry(
       long offset, long length, long minValue, long maxValue, int docCount, 
int maxDocId) {}
 
+  // Cached VectorizationProvider instance to avoid repeated stack walks in 
ensureCaller()
+  private static final 
org.apache.lucene.internal.vectorization.DocValuesRangeSupport
+      DOC_VALUES_RANGE_SUPPORT =
+          
org.apache.lucene.internal.vectorization.VectorizationProvider.getInstance()
+              .getDocValuesRangeSupport();
+
+  // Static helper so anonymous inner classes can call DocValuesRangeSupport 
from the outer class
+  static void rangeIntoBitSetVectorized(

Review Comment:
   nit, the assumption is that it is vectorized, but it might be the "default" 
implementation. can we just name this `rangeIntoBitSet`? Or something other 
than vectorized.



##########
lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultDocValuesRangeSupport.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import org.apache.lucene.util.FixedBitSet;
+import org.apache.lucene.util.LongValues;
+
+/** Scalar (non-SIMD) implementation of {@link DocValuesRangeSupport}. */
+final class DefaultDocValuesRangeSupport implements DocValuesRangeSupport {
+
+  static final DefaultDocValuesRangeSupport INSTANCE = new 
DefaultDocValuesRangeSupport();
+
+  private DefaultDocValuesRangeSupport() {}
+
+  @Override
+  public void rangeIntoBitSet(
+      LongValues values,
+      int fromDoc,
+      int toDoc,
+      long minValue,
+      long maxValue,
+      FixedBitSet bitSet,
+      int offset) {
+    // Scalar tight loop — JIT may auto-vectorize this on modern JVMs.
+    for (int d = fromDoc; d < toDoc; d++) {
+      long v = values.get(d);

Review Comment:
   this tells me we eventually might actually want a `int count = 
values.get(int[] docIds, long[] dest);`
   
   That is a larger change, but I suspect there is perf to be gained lower 
level just decoding the long values.



##########
lucene/core/src/java25/org/apache/lucene/internal/vectorization/PanamaDocValuesRangeSupport.java:
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import jdk.incubator.vector.LongVector;
+import jdk.incubator.vector.VectorMask;
+import jdk.incubator.vector.VectorOperators;
+import jdk.incubator.vector.VectorSpecies;
+import org.apache.lucene.util.FixedBitSet;
+import org.apache.lucene.util.LongValues;
+
+/** Panama Vector API implementation of {@link DocValuesRangeSupport}. */
+final class PanamaDocValuesRangeSupport implements DocValuesRangeSupport {
+
+  static final PanamaDocValuesRangeSupport INSTANCE = new 
PanamaDocValuesRangeSupport();
+
+  private static final VectorSpecies<Long> LONG_SPECIES = 
LongVector.SPECIES_PREFERRED;
+
+  private PanamaDocValuesRangeSupport() {}
+
+  @Override
+  public void rangeIntoBitSet(
+      LongValues values,
+      int fromDoc,
+      int toDoc,
+      long minValue,
+      long maxValue,
+      FixedBitSet bitSet,
+      int offset) {
+    final int vectorLen = LONG_SPECIES.length();
+
+    // Only use SIMD if vector length >= 4 (AVX-256 or better).
+    // On 128-bit SIMD (2 longs), the scratch buffer overhead outweighs the 
benefit.
+    if (vectorLen < 4) {

Review Comment:
   Have you benchmarked this to indicate no improvement here?



##########
lucene/core/src/java25/org/apache/lucene/internal/vectorization/PanamaDocValuesRangeSupport.java:
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import jdk.incubator.vector.LongVector;
+import jdk.incubator.vector.VectorMask;
+import jdk.incubator.vector.VectorOperators;
+import jdk.incubator.vector.VectorSpecies;
+import org.apache.lucene.util.FixedBitSet;
+import org.apache.lucene.util.LongValues;
+
+/** Panama Vector API implementation of {@link DocValuesRangeSupport}. */
+final class PanamaDocValuesRangeSupport implements DocValuesRangeSupport {
+
+  static final PanamaDocValuesRangeSupport INSTANCE = new 
PanamaDocValuesRangeSupport();
+
+  private static final VectorSpecies<Long> LONG_SPECIES = 
LongVector.SPECIES_PREFERRED;
+
+  private PanamaDocValuesRangeSupport() {}
+
+  @Override
+  public void rangeIntoBitSet(
+      LongValues values,
+      int fromDoc,
+      int toDoc,
+      long minValue,
+      long maxValue,
+      FixedBitSet bitSet,
+      int offset) {
+    final int vectorLen = LONG_SPECIES.length();
+
+    // Only use SIMD if vector length >= 4 (AVX-256 or better).
+    // On 128-bit SIMD (2 longs), the scratch buffer overhead outweighs the 
benefit.
+    if (vectorLen < 4) {
+      // Scalar fallback: tight loop that JIT can auto-vectorize
+      for (int d = fromDoc; d < toDoc; d++) {
+        long v = values.get(d);
+        if (v >= minValue && v <= maxValue) {
+          bitSet.set(d - offset);
+        }
+      }
+      return;
+    }
+
+    // Stack-allocated scratch buffer — vectorLen is at most 8 for AVX-512 (64 
bytes).
+    final long[] scratch = new long[vectorLen];
+    final int loopBound = fromDoc + LONG_SPECIES.loopBound(toDoc - fromDoc);
+
+    // SIMD loop: load vectorLen values into scratch, compare all at once
+    for (int d = fromDoc; d < loopBound; d += vectorLen) {
+      for (int i = 0; i < vectorLen; i++) {
+        scratch[i] = values.get(d + i);
+      }
+      LongVector v = LongVector.fromArray(LONG_SPECIES, scratch, 0);
+      VectorMask<Long> inRange =
+          v.compare(VectorOperators.GE, 
minValue).and(v.compare(VectorOperators.LE, maxValue));
+      long maskBits = inRange.toLong();
+      if (maskBits != 0) {
+        int base = d - offset;
+        while (maskBits != 0) {
+          int bit = Long.numberOfTrailingZeros(maskBits);
+          bitSet.set(base + bit);
+          maskBits &= maskBits - 1;
+        }
+      }
+    }
+
+    // Scalar tail for remaining docs
+    for (int d = loopBound; d < toDoc; d++) {
+      long v = values.get(d);
+      if (v >= minValue && v <= maxValue) {
+        bitSet.set(d - offset);
+      }
+    }

Review Comment:
   I wonder if we will hit windows of density, where `v` passes our predicate 
for multiple docs in a row. In that case, we could take advantage of 
`FixedBitSet.set(int startIndex, int endIndex)` which would provide a 
substantial speed up in those dense regions.
   
   This same idea goes for the default, etc. versions.



##########
lucene/core/src/java/org/apache/lucene/internal/vectorization/DocValuesRangeSupport.java:
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import org.apache.lucene.util.FixedBitSet;
+import org.apache.lucene.util.LongValues;
+
+/**
+ * Interface for SIMD-accelerated doc values range operations.
+ *
+ * <p>Implementations fill a {@link FixedBitSet} with the doc IDs in a range 
whose values satisfy a
+ * numeric range predicate. The default scalar implementation is used when the 
Panama Vector API is
+ * unavailable; a SIMD-accelerated implementation is used otherwise.
+ *
+ * @lucene.internal
+ */
+public interface DocValuesRangeSupport {

Review Comment:
   I think this support path, etc. all matches our existing patterns. Seems OK 
to me.



##########
lucene/core/src/java/org/apache/lucene/index/NumericDocValues.java:
##########
@@ -91,4 +91,38 @@ public void longValues(int size, int[] docs, long[] values, 
long defaultValue)
       values[i] = value;
     }
   }
+
+  /**
+   * Fills a {@link org.apache.lucene.util.FixedBitSet} with the doc IDs in 
{@code [fromDoc, toDoc)}
+   * whose values are in {@code [minValue, maxValue]}. This is a bulk 
operation that avoids per-doc
+   * virtual dispatch overhead.
+   *
+   * <p>The default implementation falls back to per-doc evaluation via {@link 
#advanceExact} and
+   * {@link #longValue}. Subclasses with random-access storage (e.g., dense 
fixed-bitsPerValue
+   * fields) can override this for significantly better performance.
+   *
+   * @param fromDoc first doc ID to evaluate (inclusive)
+   * @param toDoc last doc ID to evaluate (exclusive)
+   * @param minValue lower bound of the range (inclusive)
+   * @param maxValue upper bound of the range (inclusive)
+   * @param bitSet the bitset to fill
+   * @param offset subtracted from each doc ID before setting the bit
+   */
+  public void rangeIntoBitSet(
+      int fromDoc,
+      int toDoc,
+      long minValue,
+      long maxValue,
+      org.apache.lucene.util.FixedBitSet bitSet,

Review Comment:
   I wondered about this as well, but @romseygeek the 
DocIdSetIterator.intoBitSet is a `FixedBitSet`. I think if we are going to 
require a FixedBitSet, we need to adjust our logic to be way way faster and 
take advantage of the fact that we know its a FixedBitSet :)



##########
lucene/core/src/java25/org/apache/lucene/internal/vectorization/PanamaDocValuesRangeSupport.java:
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import jdk.incubator.vector.LongVector;
+import jdk.incubator.vector.VectorMask;
+import jdk.incubator.vector.VectorOperators;
+import jdk.incubator.vector.VectorSpecies;
+import org.apache.lucene.util.FixedBitSet;
+import org.apache.lucene.util.LongValues;
+
+/** Panama Vector API implementation of {@link DocValuesRangeSupport}. */
+final class PanamaDocValuesRangeSupport implements DocValuesRangeSupport {
+
+  static final PanamaDocValuesRangeSupport INSTANCE = new 
PanamaDocValuesRangeSupport();
+
+  private static final VectorSpecies<Long> LONG_SPECIES = 
LongVector.SPECIES_PREFERRED;
+
+  private PanamaDocValuesRangeSupport() {}
+
+  @Override
+  public void rangeIntoBitSet(
+      LongValues values,
+      int fromDoc,
+      int toDoc,
+      long minValue,
+      long maxValue,
+      FixedBitSet bitSet,
+      int offset) {
+    final int vectorLen = LONG_SPECIES.length();
+
+    // Only use SIMD if vector length >= 4 (AVX-256 or better).
+    // On 128-bit SIMD (2 longs), the scratch buffer overhead outweighs the 
benefit.
+    if (vectorLen < 4) {
+      // Scalar fallback: tight loop that JIT can auto-vectorize
+      for (int d = fromDoc; d < toDoc; d++) {
+        long v = values.get(d);
+        if (v >= minValue && v <= maxValue) {
+          bitSet.set(d - offset);
+        }
+      }
+      return;
+    }
+
+    // Stack-allocated scratch buffer — vectorLen is at most 8 for AVX-512 (64 
bytes).
+    final long[] scratch = new long[vectorLen];
+    final int loopBound = fromDoc + LONG_SPECIES.loopBound(toDoc - fromDoc);
+
+    // SIMD loop: load vectorLen values into scratch, compare all at once
+    for (int d = fromDoc; d < loopBound; d += vectorLen) {
+      for (int i = 0; i < vectorLen; i++) {
+        scratch[i] = values.get(d + i);
+      }
+      LongVector v = LongVector.fromArray(LONG_SPECIES, scratch, 0);
+      VectorMask<Long> inRange =
+          v.compare(VectorOperators.GE, 
minValue).and(v.compare(VectorOperators.LE, maxValue));
+      long maskBits = inRange.toLong();

Review Comment:
   Its a huge shame to throw away the `maskBits` which is already encoded as a 
long, especially when we know the bit set is a `FixedBitSet` and we have access 
to `FixedBitSet.getBits` ;)



##########
lucene/core/src/java/org/apache/lucene/index/NumericDocValues.java:
##########
@@ -91,4 +91,38 @@ public void longValues(int size, int[] docs, long[] values, 
long defaultValue)
       values[i] = value;
     }
   }
+
+  /**
+   * Fills a {@link org.apache.lucene.util.FixedBitSet} with the doc IDs in 
{@code [fromDoc, toDoc)}
+   * whose values are in {@code [minValue, maxValue]}. This is a bulk 
operation that avoids per-doc
+   * virtual dispatch overhead.
+   *
+   * <p>The default implementation falls back to per-doc evaluation via {@link 
#advanceExact} and
+   * {@link #longValue}. Subclasses with random-access storage (e.g., dense 
fixed-bitsPerValue
+   * fields) can override this for significantly better performance.
+   *
+   * @param fromDoc first doc ID to evaluate (inclusive)
+   * @param toDoc last doc ID to evaluate (exclusive)
+   * @param minValue lower bound of the range (inclusive)
+   * @param maxValue upper bound of the range (inclusive)
+   * @param bitSet the bitset to fill
+   * @param offset subtracted from each doc ID before setting the bit
+   */
+  public void rangeIntoBitSet(
+      int fromDoc,
+      int toDoc,
+      long minValue,
+      long maxValue,
+      org.apache.lucene.util.FixedBitSet bitSet,

Review Comment:
   Please can we actually import `org.apache.lucene.util.FixedBitSet` and just 
use `FixedBitSet` here :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to