[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michele Palmia updated LUCENE-9269: ----------------------------------- Description: If two blended queries are should clauses of a boolean query and are built so that * some of their terms are the same * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE the docFreq for the overlapping terms used for scoring is picked as follow: # if the overlapping terms are not boosted, the df of the term in the first blended query is used # if any of the overlapping terms is boosted, the df is picked at (what looks like) random. A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). {code:java} a) Blended(f:a f:b) Blended (f:a) df: 3 df: 2 gets rewritten to: (f:a)^2.0 (f:b) df: 3 df:2 b) Blended(f:a) Blended(f:a f:b) df: 2 df: 3 gets rewritten to: (f:a)^2.0 (f:b) df: 2 df:2 c) Blended(f:a f:b^0.66) Blended (f:a^0.75) df: 3 df: 2 gets rewritten to: (f:a)^1.75 (f:b)^0.66 df:? df:2 {code} with ? either 2 or 3, depending on the run. was: If two blended queries are should clauses of a boolean query and are built so that * some of their terms are the same * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE the docFreq for the overlapping terms used for scoring is picked as follow: * if the overlapping terms are not boosted, the df of the term in the first blended query is used * if any of the overlapping terms is boosted, the df is picked at (what looks like) random. A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). {code:java} 1. Blended(f:a f:b) Blended (f:a) df: 3 df: 2 gets rewritten to: (f:a)^2.0 (f:b) df: 3 df:2 Blended(f:a) Blended(f:a f:b) df: 2 df: 3 gets rewritten to: (f:a)^2.0 (f:b) df: 2 df:2 Blended(f:a f:b^0.66) Blended (f:a^0.75) df: 3 df: 2 gets rewritten to: (f:a)^1.75 (f:b)^0.66 df:? df:2 {code} with ? either 2 or 3, depending on the run. > Blended queries with boolean rewrite can result in inconstitent scores > ---------------------------------------------------------------------- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 8.4 > Reporter: Michele Palmia > Priority: Minor > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2 df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org