Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]

via GitHub Mon, 24 Feb 2025 01:05:51 -0800


javanna commented on code in PR #13470:
URL: https://github.com/apache/lucene/pull/13470#discussion_r1967234360



##########
lucene/core/src/java/org/apache/lucene/search/TopDocs.java:
##########
@@ -350,4 +354,89 @@ private static TopDocs mergeAux(
       return new TopFieldDocs(totalHits, hits, sort.getSort());
     }
   }
+
+  private record ShardIndexAndDoc(int shardIndex, int doc) {}
+
+  /**
+   * Reciprocal Rank Fusion method.
+   *
+   * <p>This method combines different search results into a single ranked 
list by combining their
+   * ranks. This is especially well suited when combining hits computed via 
different methods, whose
+   * score distributions are hardly comparable.
+   *
+   * @param topN the top N results to be returned
+   * @param k a constant determines how much influence documents in individual 
rankings have on the
+   *     final result. A higher value gives lower rank documents more 
influence. k should be greater
+   *     than or equal to 1.
+   * @param hits a list of TopDocs to apply RRF on
+   * @return a TopDocs contains the top N ranked results.
+   */
+  public static TopDocs rrf(int topN, int k, TopDocs[] hits) {
+    if (topN < 1) {
+      throw new IllegalArgumentException("topN must be >= 1, got " + topN);
+    }
+    if (k < 1) {
+      throw new IllegalArgumentException("k must be >= 1, got " + k);
+    }
+
+    boolean shardIndexSet = false;
+    outer:
+    for (TopDocs topDocs : hits) {
+      for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
+        shardIndexSet = scoreDoc.shardIndex != -1;
+        break outer;

Review Comment:
   Is the purpose here to only check the first scoreDoc of every TopDocs 
instance provided in the array? Should we try and rewrite this to be more 
readable and not use goto ?



##########
lucene/core/src/java/org/apache/lucene/search/TopDocs.java:
##########
@@ -350,4 +354,89 @@ private static TopDocs mergeAux(
       return new TopFieldDocs(totalHits, hits, sort.getSort());
     }
   }
+
+  private record ShardIndexAndDoc(int shardIndex, int doc) {}
+
+  /**
+   * Reciprocal Rank Fusion method.
+   *
+   * <p>This method combines different search results into a single ranked 
list by combining their
+   * ranks. This is especially well suited when combining hits computed via 
different methods, whose
+   * score distributions are hardly comparable.
+   *
+   * @param topN the top N results to be returned
+   * @param k a constant determines how much influence documents in individual 
rankings have on the
+   *     final result. A higher value gives lower rank documents more 
influence. k should be greater
+   *     than or equal to 1.
+   * @param hits a list of TopDocs to apply RRF on
+   * @return a TopDocs contains the top N ranked results.
+   */
+  public static TopDocs rrf(int topN, int k, TopDocs[] hits) {
+    if (topN < 1) {
+      throw new IllegalArgumentException("topN must be >= 1, got " + topN);
+    }
+    if (k < 1) {
+      throw new IllegalArgumentException("k must be >= 1, got " + k);
+    }
+
+    boolean shardIndexSet = false;
+    outer:
+    for (TopDocs topDocs : hits) {
+      for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
+        shardIndexSet = scoreDoc.shardIndex != -1;
+        break outer;
+      }
+    }
+    for (TopDocs topDocs : hits) {

Review Comment:
   would it make sense to check shardIndex while we loop later, rather than 
looping multiple times through all of the docs?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]

Reply via email to