hellosunil commented on issue #14769:
URL: https://github.com/apache/lucene/issues/14769#issuecomment-2963050158

   I agree that using internal doc ID as a tie-breaker for sorting documents 
with identical scores within a single query result is reasonable. However, I'm 
concerned about a specific scenario in multi-query RRF fusion.
   
   Consider a case with **keyword search + vector search**. If all documents 
from the keyword search have identical scores, but the vector search provides 
different rankings, shouldn't the RRF scores reflect the vector search rankings 
rather than arbitrary positional rankings from the keyword search?
   
   **Example:**
   
   Let's say we have 4 documents (A, B, C, D) and two queries:
   
   **Query 1 (Keyword Search) - All tied but sorted by doc ID:**
   
   
   - Doc A: score = 0.16, position = 0 → rank = 1
   - Doc B: score = 0.16, position = 1 → rank = 2
   - Doc C: score = 0.16, position = 2 → rank = 3
   - Doc D: score = 0.16, position = 3 → rank = 4 *(pushed down due to doc ID)*
   
   **Query 2 (Vector Search):**
   
   
   - Doc D: score = 0.9, position = 0 → rank = 1 *(best match!)*
   - Doc A: score = 0.8, position = 1 → rank = 2
   - Doc B: score = 0.7, position = 2 → rank = 3
   - Doc C: score = 0.6, position = 3 → rank = 4
   
   **Current RRF Calculation (k=60):**
   
   ```other
   Document | Query1 Rank | Query2 Rank | RRF Score Calculation                 
   | Final Score
   
---------|-------------|-------------|------------------------------------------|------------
   Doc A    |      1      |      2      | 1/(60+1) + 1/(60+2) = 0.0164 + 0.0161 
= 0.0325
   Doc B    |      2      |      3      | 1/(60+2) + 1/(60+3) = 0.0161 + 0.0159 
= 0.0320
   Doc C    |      3      |      4      | 1/(60+3) + 1/(60+4) = 0.0159 + 0.0156 
= 0.0315
   Doc D    |      4      |      1      | 1/(60+4) + 1/(60+1) = 0.0156 + 0.0164 
= 0.0320
   ```
   
   
   **Expected RRF Calculation (if tied scores got equal rank=1):**
   
   ```other
   Document | Query1 Rank | Query2 Rank | RRF Score Calculation                 
   | Final Score
   
---------|-------------|-------------|------------------------------------------|------------
   Doc D    |      1      |      1      | 1/(60+1) + 1/(60+1) = 0.0164 + 0.0164 
= 0.0328 ← Should be #1!
   Doc A    |      1      |      2      | 1/(60+1) + 1/(60+2) = 0.0164 + 0.0161 
= 0.0325
   Doc B    |      1      |      3      | 1/(60+1) + 1/(60+3) = 0.0164 + 0.0159 
= 0.0323
   Doc C    |      1      |      4      | 1/(60+1) + 1/(60+4) = 0.0164 + 0.0156 
= 0.0320
   ```
   
   
   In this example, **Doc D should be ranked #1** since it's the best vector 
match while being tied in keyword search. However, the current implementation 
ranks Doc A highest due to arbitrary positional ranking, even though Doc D is 
clearly the better overall match.
   
   This demonstrates how positional ranking for tied scores can lead to 
suboptimal RRF results that don't properly reflect the true relevance across 
multiple query types.
   
   Am I missing something or misunderstanding anything here?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to