Hi!

When debugging a query using multiplicative boost based on the product() 
function I noticed that the score computed in the explain section is correct 
while the score in the actual result is wrong.

As an example here’s a simple query that boosts a field name_text_de 
(containing German product names). The term “Netzteil” boost to 200% and “Sony” 
boosts to 300%. A name that contains both terms would be boosted to 600%. If a 
term does not match, a default pseudo boost of 1 is used (multiplicative 
identity). The params of the responseHeader in the query result are:

"q":"{!boost b=$ymb}(+{!lucene v=$yq})",
"ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
"yq":"*:*",

The parsed query of the ymb parameter translates to:

FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)))))

For a product that contains both terms, the score in the result and explain 
section correctly yields 6.0:

"name_text_de":"Original Sony Vaio Netzteil",
"score":6.0,

6.0 = product of:
  1.0 = boost
  6.0 = product of:
    1.0 = *:*
    6.0 = 
product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=3.0)

However, for a product with only “Netzteil” in the name, the result score 
wrongly is 1.0 while the explain score correctly is 2.0:

"name_text_de":"GS-Netzteil 20W schwarz",
"score":1.0,

2.0 = product of:
  1.0 = boost
  2.0 = product of:
    1.0 = *:*
    2.0 = 
product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0)

(Note: the filter chain splits words on hyphen so the “GS-“ in front of the 
“Netzteil” should not be an issue.)

Here’s the complete filter chain for the text_de field type:

<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.ManagedSynonymGraphFilterFactory" managed="de" />
        <filter class="solr.ManagedStopFilterFactory" managed="de" />
        <filter class="solr.WordDelimiterGraphFilterFactory"  
preserveOriginal="1"
                generateWordParts="1" generateNumberParts="1" catenateWords="1"
                catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.ASCIIFoldingFilterFactory" />
        <filter class="solr.GermanStemFilterFactory" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

Interestingly if I simplify the query to only boost on “Netzteil”, the score in 
both the result and explain section are correctly 2.0.

I reproduced this with a local Solr 7.5.0 server (no sharding, no replica) on 
Mac OS X 10.14.1.

I found mention of a somewhat similar situation with BooleanQuery, which was 
considered a bug and fixed in 2016: 
https://issues.apache.org/jira/browse/LUCENE-7132

So my questions are:

1. Is there something wrong in my query that prevents the “Netzteil”-only 
product to get a score of 2.0?
2. Shouldn’t the score in the result and the explain section always be the same?

Best regards,
Thomas

Reply via email to