[ https://issues.apache.org/jira/browse/LUCENE-10236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550604#comment-17550604 ]
Julie Tibshirani commented on LUCENE-10236: ------------------------------------------- I think this is a "bug" rather than "improvement" -- before we were not calculating the query score correctly. However I'd prefer to not backport as part of 8.11, because it's not super important, and it's nice to avoid changes in behavior in patch releases (the query scores could change). > CombinedFieldsQuery to use fieldAndWeights.values() when constructing > MultiNormsLeafSimScorer for scoring > --------------------------------------------------------------------------------------------------------- > > Key: LUCENE-10236 > URL: https://issues.apache.org/jira/browse/LUCENE-10236 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/sandbox > Reporter: Zach Chen > Assignee: Zach Chen > Priority: Minor > Fix For: 9.1 > > Time Spent: 7h > Remaining Estimate: 0h > > This is a spin-off issue from discussion in > [https://github.com/apache/lucene/pull/418#issuecomment-967790816], for a > quick fix in CombinedFieldsQuery scoring. > Currently CombinedFieldsQuery would use a constructed > [fields|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421] > object to create a MultiNormsLeafSimScorer for scoring, but the fields > object may contain duplicated field-weight pairs as it is [built from looping > over > fieldTerms|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414], > resulting into duplicated norms being added during scoring calculation in > MultiNormsLeafSimScorer. > E.g. for CombinedFieldsQuery with two fields and two values matching a > particular doc: > {code:java} > CombinedFieldQuery query = > new CombinedFieldQuery.Builder() > .addField("field1", (float) 1.0) > .addField("field2", (float) 1.0) > .addTerm(new BytesRef("foo")) > .addTerm(new BytesRef("zoo")) > .build(); {code} > I would imagine the scoring to be based on the following: > # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + > freq(field1:zoo) + freq(field2:zoo) > # Sum of norms on doc = norm(field1) + norm(field2) > but the current logic would use the following for scoring: > # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + > freq(field1:zoo) + freq(field2:zoo) > # Sum of norms on doc = norm(field1) + norm(field2) + norm(field1) + > norm(field2) > > In addition, this differs from how MultiNormsLeafSimScorer is constructed > from CombinedFieldsQuery explain function, which [uses > fieldAndWeights.values()|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389] > and does not contain duplicated field-weight pairs. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org