We're preparing to upgrade from Solr 6.4.2 to Solr 7.6.0, and found an
inconsistency in scoring. It appears that term boosts in the query are not
applied in Solr 7.

The query itself against both versions is identical (removed un-important
params):

<str name="q">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="defType">edismax</str>
<str name="qf">max_term</str>
<str name="q.op">AND</str>
<str name="fq">dictionary_id:"WKUS-TAL-DEPLURALIZATION-THESAURUS"</str>
<str name="rows">100</str>
<str name="wt">xml</str>
<str name="debugQuery">on</str>
</lst>

3 documents are returned, but in Solr 6 results the docs are returned in
order of the boosts (three, two, one), as the boosts accounts for the
entirety of the score, while in Solr 7 they are returned randomly, as all
the scores are 1.0.

Looking at the debug and explains, in Solr 6 the boost is multiplied to the
rest of the score:

<lst name="debug">
<str name="rawquerystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="querystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="parsedquery">(+(DisjunctionMaxQuery((max_term:"aaaa one
zzzz"))^1.0 DisjunctionMaxQuery((max_term:"aaaa two zzzz"))^2.0
DisjunctionMaxQuery((max_term:"aaaa three zzzz"))^3.0))/no_coord</str>
<str name="parsedquery_toString">+(((max_term:"aaaa one zzzz"))^1.0
((max_term:"aaaa two zzzz"))^2.0 ((max_term:"aaaa three zzzz"))^3.0)</str>
<lst name="explain">
<str name="WKUS-TAL-DEPLURALIZATION-THESAURUS_three">
3.0 = sum of:
  3.0 = weight(max_term:"aaaa three zzzz" in 658) [WKSimilarity], result of:
    3.0 = score(doc=658,freq=1.0 = phraseFreq=1.0
), product of:
      3.0 = boost
      1.0 = idf(), for phrases, always set to 1
      1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
[WKSimilarity] from:
        1.0 = phraseFreq=1.0
        1.2 = k1a
        1.2 = k1b
        0.0 = b (norms omitted for field)
</str>

But in Solr 7, the boost is not there at all:

<lst name="debug">
<str name="rawquerystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="querystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
<str name="parsedquery">+((+DisjunctionMaxQuery((max_term:"aaaa one
zzzz"))^1.0) (+DisjunctionMaxQuery((max_term:"aaaa two zzzz"))^2.0)
(+DisjunctionMaxQuery((max_term:"aaaa three zzzz"))^3.0))</str>
<str name="parsedquery_toString">+((+((max_term:"aaaa one zzzz"))^1.0)
(+((max_term:"aaaa two zzzz"))^2.0) (+((max_term:"aaaa three
zzzz"))^3.0))</str>
<lst name="explain">
<str name="WKUS-TAL-DEPLURALIZATION-THESAURUS_two">
1.0 = sum of:
  1.0 = weight(max_term:"aaaa two zzzz" in 436) [WKSimilarity], result of:
    1.0 = score(doc=436,freq=1.0 = phraseFreq=1.0
), product of:
      1.0 = idf(), for phrases, always set to 1
      1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
[WKSimilarity] from:
        1.0 = phraseFreq=1.0
        1.2 = k1a
        1.2 = k1b
        0.0 = b (norms omitted for field)
</str>

I noted a subtle difference in the parsedquery between the 2 versions as
well, not sure if that is causing the boost to drop out in Solr 7:

SOLR 6:  +(((max_term:"aaaa one zzzz"))^1.0 ((max_term:"aaaa two
zzzz"))^2.0 ((max_term:"aaaa three zzzz"))^3.0)
SOLR 7:  +((+((max_term:"aaaa one zzzz"))^1.0) (+((max_term:"aaaa two
zzzz"))^2.0) (+((max_term:"aaaa three zzzz"))^3.0))
For our use case , I think we can work around it using a constant score
query, but it would be good to know if this is a bug or expected behavior,
or we're missing something in the query to get boost to work again.

Thanks!

Reply via email to