hack4chang commented on code in PR #13470: URL: https://github.com/apache/lucene/pull/13470#discussion_r1632111769
########## lucene/core/src/java/org/apache/lucene/search/TopDocs.java: ########## @@ -350,4 +354,38 @@ private static TopDocs mergeAux( return new TopFieldDocs(totalHits, hits, sort.getSort()); } } + + /** Reciprocal Rank Fusion method. */ + public static TopDocs rrf(int TopN, int k, TopDocs[] hits) { + Map<Integer, Float> rrfScore = new HashMap<>(); + long minHits = Long.MAX_VALUE; + for (TopDocs topDoc : hits) { + minHits = Math.min(minHits, topDoc.totalHits.value); + Map<Integer, Float> scoreMap = new HashMap<>(); + for (ScoreDoc scoreDoc : topDoc.scoreDocs) { + scoreMap.put(scoreDoc.doc, scoreDoc.score); + } + + List<Map.Entry<Integer, Float>> scoreList = new ArrayList<>(scoreMap.entrySet()); + scoreList.sort(Map.Entry.comparingByValue()); + + int rank = 1; + for (ScoreDoc scoreDoc : topDoc.scoreDocs) { + rrfScore.put(scoreDoc.doc, rrfScore.getOrDefault(scoreDoc.doc, 0.0f) + 1.0f / (rank + k)); + rank++; + } + } + + List<Map.Entry<Integer, Float>> rrfScoreRank = new ArrayList<>(rrfScore.entrySet()); + rrfScoreRank.sort( + Map.Entry.<Integer, Float>comparingByValue().reversed()); // Sort in descending order + + ScoreDoc[] rrfScoreDocs = new ScoreDoc[Math.min(TopN, rrfScoreRank.size())]; + for (int i = 0; i < rrfScoreDocs.length; i++) { + rrfScoreDocs[i] = new ScoreDoc(rrfScoreRank.get(i).getKey(), rrfScoreRank.get(i).getValue()); Review Comment: So this was also a tricky part for us. For my understanding, the RRF would combine search result based on the different ranks of a documents in different results. We supposed to combine the ranks for all individual doucments, but a document come from different shards should be treated as different documents? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org