jpountz commented on code in PR #13470:
URL: https://github.com/apache/lucene/pull/13470#discussion_r1632257926


##########
lucene/core/src/java/org/apache/lucene/search/TopDocs.java:
##########
@@ -350,4 +354,38 @@ private static TopDocs mergeAux(
       return new TopFieldDocs(totalHits, hits, sort.getSort());
     }
   }
+
+  /** Reciprocal Rank Fusion method. */
+  public static TopDocs rrf(int TopN, int k, TopDocs[] hits) {
+    Map<Integer, Float> rrfScore = new HashMap<>();
+    long minHits = Long.MAX_VALUE;
+    for (TopDocs topDoc : hits) {
+      minHits = Math.min(minHits, topDoc.totalHits.value);
+      Map<Integer, Float> scoreMap = new HashMap<>();
+      for (ScoreDoc scoreDoc : topDoc.scoreDocs) {
+        scoreMap.put(scoreDoc.doc, scoreDoc.score);
+      }
+
+      List<Map.Entry<Integer, Float>> scoreList = new 
ArrayList<>(scoreMap.entrySet());
+      scoreList.sort(Map.Entry.comparingByValue());
+
+      int rank = 1;
+      for (ScoreDoc scoreDoc : topDoc.scoreDocs) {
+        rrfScore.put(scoreDoc.doc, rrfScore.getOrDefault(scoreDoc.doc, 0.0f) + 
1.0f / (rank + k));
+        rank++;
+      }
+    }
+
+    List<Map.Entry<Integer, Float>> rrfScoreRank = new 
ArrayList<>(rrfScore.entrySet());
+    rrfScoreRank.sort(
+        Map.Entry.<Integer, Float>comparingByValue().reversed()); // Sort in 
descending order
+
+    ScoreDoc[] rrfScoreDocs = new ScoreDoc[Math.min(TopN, 
rrfScoreRank.size())];
+    for (int i = 0; i < rrfScoreDocs.length; i++) {
+      rrfScoreDocs[i] = new ScoreDoc(rrfScoreRank.get(i).getKey(), 
rrfScoreRank.get(i).getValue());

Review Comment:
   This is correct. When working with shards, hits should first be combined on 
a per-query basis using TopDocs#merge, which will set the shardIndex field. And 
then global top hits can be merged across queries with RRF.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to