Adrien Grand created LUCENE-10121:
-------------------------------------
Summary: WANDScorer could skip more
Key: LUCENE-10121
URL: https://issues.apache.org/jira/browse/LUCENE-10121
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
I was looking at the NYC Taxis benchmark recently and got puzzled by the fact
that the query (cab_color:y OR cab_color:g) ran so slowly:
http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#search_bq_qps.
This is supposed to be a best-case scenario for WAND: there are only two
possible scores for documents, this query should return instantly in all
scenarios (dense, sparse, sparse and sorted).
After digging I noticed that this is due to the scaling that we due in
WANDScorer to avoid floating-point rounding errors: documents can be considered
as possible matches according to the scaled scores (which are rounded) while
they cannot possibly match according to the actual scores. This is especially
visible when many blocks contain a document that has the maximum score across
the entire postings list, so any field indexed with indexOptions=DOCS or
constant-scoring queries for instance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]