Inconsistent debugQuery score with multiplicative boost

Thomas Aglassinger Fri, 04 Jan 2019 00:12:06 -0800

Hi!

When debugging a query using multiplicative boost based on the product() 
function I noticed that the score computed in the explain section is correct 
while the score in the actual result is wrong.


As an example here’s a simple query that boosts a field name_text_de 
(containing German product names). The term “Netzteil” boost to 200% and “Sony” 
boosts to 300%. A name that contains both terms would be boosted to 600%. If a 
term does not match, a default pseudo boost of 1 is used (multiplicative 
identity). The params of the responseHeader in the query result are:

"q":"{!boost b=$ymb}(+{!lucene v=$yq})",
"ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
"yq":"*:*",

The parsed query of the ymb parameter translates to:

FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)))))

For a product that contains both terms, the score in the result and explain 
section correctly yields 6.0:

"name_text_de":"Original Sony Vaio Netzteil",
"score":6.0,

6.0 = product of:
  1.0 = boost
  6.0 = product of:
    1.0 = *:*
    6.0 = 
product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=3.0)

However, for a product with only “Netzteil” in the name, the result score 
wrongly is 1.0 while the explain score correctly is 2.0:

"name_text_de":"GS-Netzteil 20W schwarz",
"score":1.0,

2.0 = product of:
  1.0 = boost
  2.0 = product of:
    1.0 = *:*
    2.0 = 
product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0)

(Note: the filter chain splits words on hyphen so the “GS-“ in front of the 
“Netzteil” should not be an issue.)

Here’s the complete filter chain for the text_de field type:

<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.ManagedSynonymGraphFilterFactory" managed="de" />
        <filter class="solr.ManagedStopFilterFactory" managed="de" />
        <filter class="solr.WordDelimiterGraphFilterFactory"  
preserveOriginal="1"
                generateWordParts="1" generateNumberParts="1" catenateWords="1"
                catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.ASCIIFoldingFilterFactory" />
        <filter class="solr.GermanStemFilterFactory" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

Interestingly if I simplify the query to only boost on “Netzteil”, the score in 
both the result and explain section are correctly 2.0.

I reproduced this with a local Solr 7.5.0 server (no sharding, no replica) on 
Mac OS X 10.14.1.

I found mention of a somewhat similar situation with BooleanQuery, which was 
considered a bug and fixed in 2016: 
https://issues.apache.org/jira/browse/LUCENE-7132

So my questions are:

1. Is there something wrong in my query that prevents the “Netzteil”-only 
product to get a score of 2.0?
2. Shouldn’t the score in the result and the explain section always be the same?

Best regards,
Thomas

Inconsistent debugQuery score with multiplicative boost

Reply via email to