Michele Palmia created LUCENE-9269:
--------------------------------------
Summary: Blended queries with boolean rewrite can result in
inconstitent scores
Key: LUCENE-9269
URL: https://issues.apache.org/jira/browse/LUCENE-9269
Project: Lucene - Core
Issue Type: Bug
Components: core/search
Affects Versions: 8.4
Reporter: Michele Palmia
If two blended queries are built so that
* some of their terms are the same
* their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
the docFreq for the overlapping terms used for scoring is picked as follow:
* if the overlapping terms are not boosted, the df of the term in the first
blended query is used
* if any of the overlapping terms is boosted, the df is picked at (what looks
like) random.
A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
{code:java}
1.
Blended(f:a f:b) Blended (f:a)
df: 3 df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3 df:2
Blended(f:a) Blended(f:a f:b)
df: 2 df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
df: 2 df:2
Blended(f:a f:b^0.66) Blended (f:a^0.75)
df: 3 df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
df:? df:2
{code}
with ? either 2 or 3, depending on the run.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]