Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

via GitHub Fri, 04 Jul 2025 14:47:05 -0700


jpountz commented on code in PR #14896:
URL: https://github.com/apache/lucene/pull/14896#discussion_r2186132941



##########
lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -1001,4 +1007,26 @@ public float recalculateScalarQuantizationOffset(
 
     return correction;
   }
+
+  @Override
+  public int filterWithDouble(int[] docBuffer, double[] scoreBuffer, double 
threshold, int upTo) {
+    int newUpto = 0;
+    int i = 0;
+    for (int bound = upTo - DOUBLE_SPECIES.length() + 1; i < bound; i += 
DOUBLE_SPECIES.length()) {

Review Comment:
   Can you use VectorSpecies#loopBound to make this more idiomatic?



##########
lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java:
##########
@@ -217,7 +217,8 @@ private static Optional<Module> lookupVectorModule() {
           "org.apache.lucene.util.VectorUtil",
           "org.apache.lucene.codecs.lucene103.Lucene103PostingsReader",
           "org.apache.lucene.codecs.lucene103.PostingIndexInput",
-          "org.apache.lucene.tests.util.TestSysoutsLimits");
+          "org.apache.lucene.tests.util.TestSysoutsLimits",
+          "org.apache.lucene.search.ScorerUtil");

Review Comment:
   This shouldn't be necessary since ScorerUtil calls the new method via 
VectorUtil which is already whitelisted?



##########
lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##########
@@ -1001,4 +1007,26 @@ public float recalculateScalarQuantizationOffset(
 
     return correction;
   }
+
+  @Override
+  public int filterWithDouble(int[] docBuffer, double[] scoreBuffer, double 
threshold, int upTo) {
+    int newUpto = 0;
+    int i = 0;
+    for (int bound = upTo - DOUBLE_SPECIES.length() + 1; i < bound; i += 
DOUBLE_SPECIES.length()) {
+      DoubleVector scoreVector = DoubleVector.fromArray(DOUBLE_SPECIES, 
scoreBuffer, i);
+      IntVector docVector = IntVector.fromArray(INT_FOR_DOUBLE_SPECIES, 
docBuffer, i);
+      VectorMask<Double> mask = scoreVector.compare(VectorOperators.GE, 
threshold);
+      
docVector.compress(mask.cast(INT_FOR_DOUBLE_SPECIES)).intoArray(docBuffer, 
newUpto);
+      scoreVector.compress(mask).intoArray(scoreBuffer, newUpto);

Review Comment:
   nit: it would be nicer to compress vectors in the same order as they were 
declared a few lines above



##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -376,4 +376,24 @@ public static float recalculateOffset(
     return IMPL.recalculateScalarQuantizationOffset(
         vector, oldAlpha, oldMinQuantile, scale, alpha, minQuantile, 
maxQuantile);
   }
+
+  /**
+   * filter both docBuffer and scoreBuffer with threshold, each docBuffer and 
scoreBuffer of the
+   * same index forms a pair, pairs with score less than threshold will be 
filtered out from the
+   * array.
+   *
+   * @param docBuffer doc buffer contains docs (or some other value forms a 
pair with scoreBuffer)
+   * @param scoreBuffer score buffer contains scores to be compared with 
threshold
+   * @param threshold minimal required double value to not be filtered out
+   * @param upTo where the filter should end
+   * @return how many pairs left after filter
+   */
+  public static int filterWithDouble(
+      int[] docBuffer, double[] scoreBuffer, double threshold, int upTo) {

Review Comment:
   nit: maybe rename threshold to `minScoreInclusive` to better convey 
expectations that this is a min score (as opposed to max) and that it is 
inclusive?



##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -376,4 +376,24 @@ public static float recalculateOffset(
     return IMPL.recalculateScalarQuantizationOffset(
         vector, oldAlpha, oldMinQuantile, scale, alpha, minQuantile, 
maxQuantile);
   }
+
+  /**
+   * filter both docBuffer and scoreBuffer with threshold, each docBuffer and 
scoreBuffer of the
+   * same index forms a pair, pairs with score less than threshold will be 
filtered out from the
+   * array.
+   *
+   * @param docBuffer doc buffer contains docs (or some other value forms a 
pair with scoreBuffer)
+   * @param scoreBuffer score buffer contains scores to be compared with 
threshold
+   * @param threshold minimal required double value to not be filtered out
+   * @param upTo where the filter should end
+   * @return how many pairs left after filter
+   */
+  public static int filterWithDouble(

Review Comment:
   I wonder if we can find a better name. `filterByScore` maybe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

Reply via email to