javanna commented on code in PR #13470: URL: https://github.com/apache/lucene/pull/13470#discussion_r1967234360
########## lucene/core/src/java/org/apache/lucene/search/TopDocs.java: ########## @@ -350,4 +354,89 @@ private static TopDocs mergeAux( return new TopFieldDocs(totalHits, hits, sort.getSort()); } } + + private record ShardIndexAndDoc(int shardIndex, int doc) {} + + /** + * Reciprocal Rank Fusion method. + * + * <p>This method combines different search results into a single ranked list by combining their + * ranks. This is especially well suited when combining hits computed via different methods, whose + * score distributions are hardly comparable. + * + * @param topN the top N results to be returned + * @param k a constant determines how much influence documents in individual rankings have on the + * final result. A higher value gives lower rank documents more influence. k should be greater + * than or equal to 1. + * @param hits a list of TopDocs to apply RRF on + * @return a TopDocs contains the top N ranked results. + */ + public static TopDocs rrf(int topN, int k, TopDocs[] hits) { + if (topN < 1) { + throw new IllegalArgumentException("topN must be >= 1, got " + topN); + } + if (k < 1) { + throw new IllegalArgumentException("k must be >= 1, got " + k); + } + + boolean shardIndexSet = false; + outer: + for (TopDocs topDocs : hits) { + for (ScoreDoc scoreDoc : topDocs.scoreDocs) { + shardIndexSet = scoreDoc.shardIndex != -1; + break outer; Review Comment: Is the purpose here to only check the first scoreDoc of every TopDocs instance provided in the array? Should we try and rewrite this to be more readable and not use goto ? ########## lucene/core/src/java/org/apache/lucene/search/TopDocs.java: ########## @@ -350,4 +354,89 @@ private static TopDocs mergeAux( return new TopFieldDocs(totalHits, hits, sort.getSort()); } } + + private record ShardIndexAndDoc(int shardIndex, int doc) {} + + /** + * Reciprocal Rank Fusion method. + * + * <p>This method combines different search results into a single ranked list by combining their + * ranks. This is especially well suited when combining hits computed via different methods, whose + * score distributions are hardly comparable. + * + * @param topN the top N results to be returned + * @param k a constant determines how much influence documents in individual rankings have on the + * final result. A higher value gives lower rank documents more influence. k should be greater + * than or equal to 1. + * @param hits a list of TopDocs to apply RRF on + * @return a TopDocs contains the top N ranked results. + */ + public static TopDocs rrf(int topN, int k, TopDocs[] hits) { + if (topN < 1) { + throw new IllegalArgumentException("topN must be >= 1, got " + topN); + } + if (k < 1) { + throw new IllegalArgumentException("k must be >= 1, got " + k); + } + + boolean shardIndexSet = false; + outer: + for (TopDocs topDocs : hits) { + for (ScoreDoc scoreDoc : topDocs.scoreDocs) { + shardIndexSet = scoreDoc.shardIndex != -1; + break outer; + } + } + for (TopDocs topDocs : hits) { Review Comment: would it make sense to check shardIndex while we loop later, rather than looping multiple times through all of the docs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org