I further checked that BM25Similarity class until solr 7.7 has a null check for norms in the explainTFNorm method but this is removed in Solr 8 onwards. Does omitNorms work in Solr8? Can someone send me what the debug output looks like with omitNorms="true"? Here is my config: <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
On Mon, Jan 18, 2021 at 1:51 PM Kerwin <kerwin...@gmail.com> wrote: > Hi eveybody, > > I am migrating from solr 6.5.1 to solr 8.6.1 and am having a couple of > issues for which I need your help. There is a significant change in ranking > between Solr 6 and 8 search results which I need to fix before using Solr8 > in our live environment. I noticed a couple of changes upfront which could > be some of the reasons for ranking changes. > > 1. Solr Omit norms not working as expected in Solr 8 with > BM25SimilarityFactory. > 2. LegacyBM25SimilarityFactory 'qf' parameter boost value not correct when > using Edismax. > > I tried the Solr examples with the following configuration and can > replicate the difference on Solr 8.6.1. > > *Schema being used:* > <field name="manu" type="text_general" indexed="true" stored="true" > *omitNorms="true"*/> > > *Solr query:* > http://localhost:8983/solr/solr/select?q=*manu:Samsung* > &debug=true&wt=json&indent=on > > *Solr 6 debug output (Note, 0.0 = parameter b (norms omitted for field))* > "SP2514N":" > 2.6390574 = weight(manu:samsung in 1) [SchemaSimilarity], result of: > 2.6390574 = score(doc=1,freq=1.0 = termFreq=1.0 > ), product of: > 2.6390574 = idf, computed as log(1 + (docCount - docFreq + 0.5) / > (docFreq + 0.5)) from: > 1.0 = docFreq > 20.0 = docCount > 1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from: > 1.0 = termFreq=1.0 > 1.2 = parameter k1 > *0.0 = parameter b (norms omitted for field)* > "} > > *Solr 8 debug output* > "SP2514N":" > 1.5827883 = weight(manu:samsung in 1) [SchemaSimilarity], result of: > 1.5827883 = score(freq=1.0), computed as boost * idf * tf from: > 2.6390574 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from: > 1 = n, number of documents containing term > 20 = N, total number of documents with field > 0.59975517 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / > avgdl)) from: > 1.0 = freq, occurrences of term within document > 1.2 = k1, term saturation parameter > > > *0.75 = b, length normalization parameter 1.0 = dl, length of field > 2.45 = avgdl, average length of field* > "} > > As you can see above, length normalization is not used in solr 6 which is > correct while it is being used in Solr 8. I tried to replicate this with > LegacyBM25SimilarityFactory as well and see the same issue there. Secondly > LegacyBM25SimilarityFactory is behaving differently with the *'qf' boost* > value for fields with the edismax parser which I am also using. > > Request handler with Edismax: > <requestHandler name="/search" class="solr.SearchHandler"> > <lst name="defaults"> > <str name="echoParams">explicit</str> > <str name="wt">json</str> > <str name="indent">off</str> > <int name="rows">10</int> > <str name="defType">edismax</str> > <str name="qf">manu</str> > <str name="mm">100%</str> > <str name="lowercaseOperators">false</str> > </lst> > </requestHandler> > > Debug output: > "SP2514N":" > 3.4821343 = weight(manu:samsung in 1) [LegacyBM25Similarity], result of: > 3.4821343 = score(freq=1.0), computed as boost * idf * tf from: > *2.2 = boost* > 2.6390574 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from: > 1 = n, number of documents containing term > 20 = N, total number of documents with field > 0.59975517 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / > avgdl)) from: > 1.0 = freq, occurrences of term within document > 1.2 = k1, term saturation parameter > 0.75 = b, length normalization parameter > 1.0 = dl, length of field > 2.45 = avgdl, average length of field > "} > > On checking the Solr source code this value of 2.2 = boost is roughly > equal to 1 + k1, as per the code below. > > return bm25Similarity.scorer(*boost * (1 + bm25Similarity.getK1()*), > collectionStats, termStats); > > Since LegacyBM25Similarity is supposed to keep the same scoring as Solr 6 > BM25Similarity, which is not working as expected, I cannot test the changes > in scoring. Kindly help to resolve the above 2 issues. I could be doing > something wrong with the configuration, but I read the Solr 7 and Solr 8 > migration notes, so not sure where I'm going wrong. Kindly advise. >