Adrien Grand created LUCENE-10121:
-------------------------------------

             Summary: WANDScorer could skip more
                 Key: LUCENE-10121
                 URL: https://issues.apache.org/jira/browse/LUCENE-10121
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Adrien Grand


I was looking at the NYC Taxis benchmark recently and got puzzled by the fact 
that the query (cab_color:y OR cab_color:g) ran so slowly: 
http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#search_bq_qps.
 This is supposed to be a best-case scenario for WAND: there are only two 
possible scores for documents, this query should return instantly in all 
scenarios (dense, sparse, sparse and sorted).

After digging I noticed that this is due to the scaling that we due in 
WANDScorer to avoid floating-point rounding errors: documents can be considered 
as possible matches according to the scaled scores (which are rounded) while 
they cannot possibly match according to the actual scores. This is especially 
visible when many blocks contain a document that has the maximum score across 
the entire postings list, so any field indexed with indexOptions=DOCS or 
constant-scoring queries for instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to