Adrien Grand created LUCENE-10121: ------------------------------------- Summary: WANDScorer could skip more Key: LUCENE-10121 URL: https://issues.apache.org/jira/browse/LUCENE-10121 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand
I was looking at the NYC Taxis benchmark recently and got puzzled by the fact that the query (cab_color:y OR cab_color:g) ran so slowly: http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#search_bq_qps. This is supposed to be a best-case scenario for WAND: there are only two possible scores for documents, this query should return instantly in all scenarios (dense, sparse, sparse and sorted). After digging I noticed that this is due to the scaling that we due in WANDScorer to avoid floating-point rounding errors: documents can be considered as possible matches according to the scaled scores (which are rounded) while they cannot possibly match according to the actual scores. This is especially visible when many blocks contain a document that has the maximum score across the entire postings list, so any field indexed with indexOptions=DOCS or constant-scoring queries for instance. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org