jpountz opened a new pull request, #12589:
URL: https://github.com/apache/lucene/pull/12589

   The idea behind MAXSCORE is to run disjunctions as `+(essentialClause1 ... 
essentialClauseM) nonEssentialClause1 ... nonEssentialClauseN`, moving more and 
more clauses from the essential list to the non-essential list as the minimum 
competitive score increases. For instance, a query such as `the book of life` 
which I found in the Tantivy benchmark ends up running as `+book the of life` 
after some time, ie. with one required clause and other clauses optional. This 
is because matching `the`, `of` and `life` alone is not good enough for 
yielding a match.
   
   Here some statistics in that case:
    - min competitive score: 3.4781857
    - max_window_score(book): 2.8796153
    - max_window_score(life): 2.037863
    - max_window_score(the): 0.103848875
    - max_window_score(of): 0.19427927
   
   Actually if you look at these statistics, we could do better, because a 
match may only be competitive if it matches both `book` and `life`, so this 
query could actually execute as `+book +life the of`, which may help evaluate 
fewer documents compared to `+book the of life`. Especially if you enable 
recursive graph bisection.
   
   This is what this PR tries to achieve: in the event when there is a single 
essential clause and matching all clauses but the best non-essential clause 
cannot produce a competitive match, then the scorer will only evaluate 
documents that match the intersection of the essential clause and the best 
non-essential clause.
   
   It's worth noting that this optimization would kick in very frequently on 
2-clauses disjunctions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to