[jira] [Updated] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

Michele Palmia (Jira) Tue, 10 Mar 2020 05:15:25 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michele Palmia updated LUCENE-9269:
-----------------------------------
    Description: 
If two blended queries are should clauses of a boolean query and are built so 
that
 * some of their terms are the same
 * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

the docFreq for the overlapping terms used for scoring is picked as follow:
 # if the overlapping terms are not boosted, the df of the term in the first 
blended query is used
 # if any of the overlapping terms is boosted, the df is picked at (what looks 
like) random.

A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
{code:java}
a)
Blended(f:a f:b) Blended (f:a)
        df: 3             df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3      df:2

b)
Blended(f:a) Blended(f:a f:b)
        df: 2        df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
 df: 2     df:2

c)
Blended(f:a f:b^0.66) Blended (f:a^0.75)
        df: 3                  df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
 df:?       df:2
{code}
with ? either 2 or 3, depending on the run.

 

  was:
If two blended queries are should clauses of a boolean query and are built so 
that
 * some of their terms are the same
 * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

the docFreq for the overlapping terms used for scoring is picked as follow:
 * if the overlapping terms are not boosted, the df of the term in the first 
blended query is used
 * if any of the overlapping terms is boosted, the df is picked at (what looks 
like) random.

A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
{code:java}
1.
Blended(f:a f:b) Blended (f:a)
        df: 3             df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3      df:2

Blended(f:a) Blended(f:a f:b)
        df: 2        df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
 df: 2     df:2

Blended(f:a f:b^0.66) Blended (f:a^0.75)
        df: 3                  df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
 df:?       df:2
{code}
with ? either 2 or 3, depending on the run.

 


> Blended queries with boolean rewrite can result in inconstitent scores
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-9269
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9269
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 8.4
>            Reporter: Michele Palmia
>            Priority: Minor
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
>         df: 3             df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3      df:2
> b)
> Blended(f:a) Blended(f:a f:b)
>         df: 2        df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2     df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
>         df: 3                  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?       df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

Reply via email to