[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064294#comment-17064294 ]
Michele Palmia commented on LUCENE-9269: ---------------------------------------- I have a few questions, please feel free to let me know if they're too dumb: - while testing a solution for adding {{perReaderTermState}} to the current {{TermQuery#equals}} implementation, I found a test that I believe is not doing anything of what it was designed to do - essentially it was rewritten for an only tangentially related change, and it's been working as no-op since (test is [TestMultiTermQueryRewrites#checkBoosts|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/test/org/apache/lucene/search/TestMultiTermQueryRewrites.java#L215], problematic edit was [this|https://github.com/apache/lucene-solr/commit/30807709e663c35f6760084632407dc1bf76aff7#diff-581d1e68f090e657acc327fc90534c51], missing essential {{initialSeekTerm}}). Should I fix it as part of my proposal for this or open a new issue? - What's your opinion on comparing two TermQueries only one of which has a {{perReaderTermState}}? I'd say the're different, but their Weights could ultimately end up using the exact same statistics. - Changing {{equals}} without changing {{toString}} mean errors like {code:java} expected:<foo:bar> but was:<foo:bar> {code} are possible. That seems to me less of an issue than adding df/ttf to the TermQuery representation. Is that so? > Blended queries with boolean rewrite can result in inconstitent scores > ---------------------------------------------------------------------- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 8.4 > Reporter: Michele Palmia > Priority: Minor > Attachments: LUCENE-9269-test.patch > > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2 df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org