[I] SpanOrQuery uses IDFs of failed subqueries in score calculation. [lucene]

via GitHub Mon, 16 Sep 2024 08:02:14 -0700


tkarampAlpha opened a new issue, #13796:
URL: https://github.com/apache/lucene/issues/13796


   ### Description
   
   It seems that for SpanOrQuery IDF of terms belonging in subqueries that will 
not match a given document, will affect said document's score.
   
   I have observed this through on which I have 3 documents:
   
   ```
   doc1: 
       field: something
   doc2:
       field: nothing
   doc3: 
       field: anything
   ```
   
   And I issue the following query:
   
   ```spanOr([Contents:something, Contents:nothing])```
   
   If you check at the score explanation you will notice that in both 
document's score the idf of both terms affects it even though for each document 
only one matches.
   
   This is an example of the explanation of the first document's score:
   ```
   3.9616547 = weight(spanOr([Contents:something, Contents:nothing]) in 0) 
[AsBM25Similarity], result of:
     3.9616547 = score(freq=1.0), computed as boost * idf * tf from:
       51.0 = boost
       3.9616585 = idf, sum of:
         1.9808292 = idf for term nothing , computed as log(1 + (docCount - 
docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
           1 = docFreq
           3 = docCount
         1.9808292 = idf for term something , computed as log(1 + (docCount - 
docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
           1 = docFreq
           3 = docCount
       0.019607842 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / 
avgdl)) from:
         1.0 = phraseFreq=1.0
         50.0 = k1, term saturation parameter
         0.0 = b, length normalization parameter
         1.0 = dl, length of field
         2.0 = avgdl, average length of field
   ```
   
   
   ### Version and environment details
   
   lucene 9.7.0 through solr 9.3.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] SpanOrQuery uses IDFs of failed subqueries in score calculation. [lucene]

Reply via email to