[jira] [Updated] (LUCENE-10121) WANDScorer could skip more

Adrien Grand (Jira) Thu, 23 Sep 2021 11:19:07 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-10121:
----------------------------------
    Description: 
I was looking at the NYC Taxis benchmark recently and got puzzled by the fact 
that the query (cab_color:y OR cab_color:g) ran so slowly: 
http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#search_bq_qps.
 This is supposed to be a best-case scenario for WAND: there are only two 
possible scores for documents, this query should return instantly in the sorted 
case.

After digging I noticed that this is due to the scaling that we due in 
WANDScorer to avoid floating-point rounding errors: documents can be considered 
as possible matches according to the scaled scores (which are rounded) while 
they cannot possibly match according to the actual scores. This is especially 
visible when many blocks contain a document that has the maximum score across 
the entire postings list, so any field indexed with indexOptions=DOCS or 
constant-scoring queries for instance.

  was:
I was looking at the NYC Taxis benchmark recently and got puzzled by the fact 
that the query (cab_color:y OR cab_color:g) ran so slowly: 
http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#search_bq_qps.
 This is supposed to be a best-case scenario for WAND: there are only two 
possible scores for documents, this query should return instantly in all 
scenarios (dense, sparse, sparse and sorted).

After digging I noticed that this is due to the scaling that we due in 
WANDScorer to avoid floating-point rounding errors: documents can be considered 
as possible matches according to the scaled scores (which are rounded) while 
they cannot possibly match according to the actual scores. This is especially 
visible when many blocks contain a document that has the maximum score across 
the entire postings list, so any field indexed with indexOptions=DOCS or 
constant-scoring queries for instance.


> WANDScorer could skip more
> --------------------------
>
>                 Key: LUCENE-10121
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10121
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> I was looking at the NYC Taxis benchmark recently and got puzzled by the fact 
> that the query (cab_color:y OR cab_color:g) ran so slowly: 
> http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#search_bq_qps.
>  This is supposed to be a best-case scenario for WAND: there are only two 
> possible scores for documents, this query should return instantly in the 
> sorted case.
> After digging I noticed that this is due to the scaling that we due in 
> WANDScorer to avoid floating-point rounding errors: documents can be 
> considered as possible matches according to the scaled scores (which are 
> rounded) while they cannot possibly match according to the actual scores. 
> This is especially visible when many blocks contain a document that has the 
> maximum score across the entire postings list, so any field indexed with 
> indexOptions=DOCS or constant-scoring queries for instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10121) WANDScorer could skip more

Reply via email to