Hi eveybody, I am migrating from solr 6.5.1 to solr 8.6.1 and am having a couple of issues for which I need your help. There is a significant change in ranking between Solr 6 and 8 search results which I need to fix before using Solr8 in our live environment. I noticed a couple of changes upfront which could be some of the reasons for ranking changes.
1. Solr Omit norms not working as expected in Solr 8 with BM25SimilarityFactory. 2. LegacyBM25SimilarityFactory 'qf' parameter boost value not correct when using Edismax. I tried the Solr examples with the following configuration and can replicate the difference on Solr 8.6.1. *Schema being used:* <field name="manu" type="text_general" indexed="true" stored="true" *omitNorms="true"*/> *Solr query:* http://localhost:8983/solr/solr/select?q=*manu:Samsung* &debug=true&wt=json&indent=on *Solr 6 debug output (Note, 0.0 = parameter b (norms omitted for field))* "SP2514N":" 2.6390574 = weight(manu:samsung in 1) [SchemaSimilarity], result of: 2.6390574 = score(doc=1,freq=1.0 = termFreq=1.0 ), product of: 2.6390574 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 1.0 = docFreq 20.0 = docCount 1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from: 1.0 = termFreq=1.0 1.2 = parameter k1 *0.0 = parameter b (norms omitted for field)* "} *Solr 8 debug output* "SP2514N":" 1.5827883 = weight(manu:samsung in 1) [SchemaSimilarity], result of: 1.5827883 = score(freq=1.0), computed as boost * idf * tf from: 2.6390574 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from: 1 = n, number of documents containing term 20 = N, total number of documents with field 0.59975517 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from: 1.0 = freq, occurrences of term within document 1.2 = k1, term saturation parameter *0.75 = b, length normalization parameter 1.0 = dl, length of field 2.45 = avgdl, average length of field* "} As you can see above, length normalization is not used in solr 6 which is correct while it is being used in Solr 8. I tried to replicate this with LegacyBM25SimilarityFactory as well and see the same issue there. Secondly LegacyBM25SimilarityFactory is behaving differently with the *'qf' boost* value for fields with the edismax parser which I am also using. Request handler with Edismax: <requestHandler name="/search" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="wt">json</str> <str name="indent">off</str> <int name="rows">10</int> <str name="defType">edismax</str> <str name="qf">manu</str> <str name="mm">100%</str> <str name="lowercaseOperators">false</str> </lst> </requestHandler> Debug output: "SP2514N":" 3.4821343 = weight(manu:samsung in 1) [LegacyBM25Similarity], result of: 3.4821343 = score(freq=1.0), computed as boost * idf * tf from: *2.2 = boost* 2.6390574 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from: 1 = n, number of documents containing term 20 = N, total number of documents with field 0.59975517 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from: 1.0 = freq, occurrences of term within document 1.2 = k1, term saturation parameter 0.75 = b, length normalization parameter 1.0 = dl, length of field 2.45 = avgdl, average length of field "} On checking the Solr source code this value of 2.2 = boost is roughly equal to 1 + k1, as per the code below. return bm25Similarity.scorer(*boost * (1 + bm25Similarity.getK1()*), collectionStats, termStats); Since LegacyBM25Similarity is supposed to keep the same scoring as Solr 6 BM25Similarity, which is not working as expected, I cannot test the changes in scoring. Kindly help to resolve the above 2 issues. I could be doing something wrong with the configuration, but I read the Solr 7 and Solr 8 migration notes, so not sure where I'm going wrong. Kindly advise.