I further checked that BM25Similarity class until solr 7.7 has a null check
for norms in the explainTFNorm method but this is removed in Solr 8
onwards. Does omitNorms work in Solr8? Can someone send me what the debug
output looks like with omitNorms="true"?
Here is my config:
<field name="manu" type="text_general" indexed="true" stored="true"
omitNorms="true"/>

On Mon, Jan 18, 2021 at 1:51 PM Kerwin <kerwin...@gmail.com> wrote:

> Hi eveybody,
>
> I am migrating from solr 6.5.1 to solr 8.6.1 and am having a couple of
> issues for which I need your help. There is a significant change in ranking
> between Solr 6 and 8 search results which I need to fix before using Solr8
> in our live environment. I noticed a couple of changes upfront which could
> be some of the reasons for ranking changes.
>
> 1. Solr Omit norms not working as expected in Solr 8 with
> BM25SimilarityFactory.
> 2. LegacyBM25SimilarityFactory 'qf' parameter boost value not correct when
> using Edismax.
>
> I tried the Solr examples with the following configuration and can
> replicate the difference on Solr 8.6.1.
>
> *Schema being used:*
> <field name="manu" type="text_general" indexed="true" stored="true"
> *omitNorms="true"*/>
>
> *Solr query:*
> http://localhost:8983/solr/solr/select?q=*manu:Samsung*
> &debug=true&wt=json&indent=on
>
> *Solr 6 debug output (Note, 0.0 = parameter b (norms omitted for field))*
>  "SP2514N":"
> 2.6390574 = weight(manu:samsung in 1) [SchemaSimilarity], result of:
>   2.6390574 = score(doc=1,freq=1.0 = termFreq=1.0
> ), product of:
>     2.6390574 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
>       1.0 = docFreq
>       20.0 = docCount
>     1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from:
>       1.0 = termFreq=1.0
>       1.2 = parameter k1
>       *0.0 = parameter b (norms omitted for field)*
> "}
>
> *Solr 8 debug output*
> "SP2514N":"
> 1.5827883 = weight(manu:samsung in 1) [SchemaSimilarity], result of:
>   1.5827883 = score(freq=1.0), computed as boost * idf * tf from:
>     2.6390574 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
>       1 = n, number of documents containing term
>       20 = N, total number of documents with field
>     0.59975517 = tf, computed as freq / (freq + k1 * (1 - b + b * dl /
> avgdl)) from:
>       1.0 = freq, occurrences of term within document
>       1.2 = k1, term saturation parameter
>
>
> *0.75 = b, length normalization parameter      1.0 = dl, length of field
>     2.45 = avgdl, average length of field*
> "}
>
> As you can see above, length normalization is not used in solr 6 which is
> correct while it is being used in Solr 8. I tried to replicate this with
> LegacyBM25SimilarityFactory as well and see the same issue there. Secondly
> LegacyBM25SimilarityFactory is behaving differently with the *'qf' boost*
> value for fields with the edismax parser which I am also using.
>
> Request handler with Edismax:
> <requestHandler name="/search" class="solr.SearchHandler">
> <lst name="defaults">
> <str name="echoParams">explicit</str>
> <str name="wt">json</str>
> <str name="indent">off</str>
> <int name="rows">10</int>
> <str name="defType">edismax</str>
> <str name="qf">manu</str>
> <str name="mm">100%</str>
> <str name="lowercaseOperators">false</str>
> </lst>
> </requestHandler>
>
> Debug output:
> "SP2514N":"
> 3.4821343 = weight(manu:samsung in 1) [LegacyBM25Similarity], result of:
>   3.4821343 = score(freq=1.0), computed as boost * idf * tf from:
>     *2.2 = boost*
>     2.6390574 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
>       1 = n, number of documents containing term
>       20 = N, total number of documents with field
>     0.59975517 = tf, computed as freq / (freq + k1 * (1 - b + b * dl /
> avgdl)) from:
>       1.0 = freq, occurrences of term within document
>       1.2 = k1, term saturation parameter
>       0.75 = b, length normalization parameter
>       1.0 = dl, length of field
>       2.45 = avgdl, average length of field
> "}
>
> On checking the Solr source code this value of 2.2 = boost is roughly
> equal to 1 + k1, as per the code below.
>
> return bm25Similarity.scorer(*boost * (1 + bm25Similarity.getK1()*),
> collectionStats, termStats);
>
> Since LegacyBM25Similarity is supposed to keep the same scoring as Solr 6
> BM25Similarity, which is not working as expected, I cannot test the changes
> in scoring. Kindly help to resolve the above 2 issues. I could be doing
> something wrong with the configuration, but I read the Solr 7 and Solr 8
> migration notes, so not sure where I'm going wrong. Kindly advise.
>

Reply via email to