Zach Chen created LUCENE-10236: ---------------------------------- Summary: CombinedFieldsQuery to use fieldAndWeights.values() when constructing MultiNormsLeafSimScorer for scoring Key: LUCENE-10236 URL: https://issues.apache.org/jira/browse/LUCENE-10236 Project: Lucene - Core Issue Type: Improvement Components: modules/sandbox Reporter: Zach Chen Assignee: Zach Chen
This is a spin-off issue from discussion in [https://github.com/apache/lucene/pull/418#issuecomment-967790816], for a quick fix in CombinedFieldsQuery scoring. Currently CombinedFieldsQuery would use a constructed [fields|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421] object to create a MultiNormsLeafSimScorer for scoring, but the fields object may contain duplicated field-weight pairs as it is [built from looping over fieldTerms|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414], resulting into duplicated norms being added during scoring calculation in MultiNormsLeafSimScorer. E.g. for CombinedFieldsQuery with two fields and two values matching a particular doc: {code:java} CombinedFieldQuery query = new CombinedFieldQuery.Builder() .addField("field1", (float) 1.0) .addField("field2", (float) 1.0) .addTerm(new BytesRef("foo")) .addTerm(new BytesRef("zoo")) .build(); {code} I would imagine the scoring to be based on the following: # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + freq(field1:zoo) + freq(field2:zoo) # Sum of norms on doc = norm(field1) + norm(field2) but the current logic would use the following for scoring: # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + freq(field1:zoo) + freq(field2:zoo) # Sum of norms on doc = norm(field1) + norm(field2) + norm(field1) + norm(field2) In addition, this differs from how MultiNormsLeafSimScorer is constructed from CombinedFieldsQuery explain function, which [uses fieldAndWeights.values()|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389] and does not contain duplicated field-weight pairs. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org