The default similarity changed from TF-IDF to BM25 in 6.0. On Fri, Aug 19, 2016 at 3:00 PM John Bickerstaff <j...@johnbickerstaff.com> wrote:
> Bump! > > TL;DR Question: Are scores (and debug output) *expected* to be different > between 5.4 and 6.1? > > On Thu, Aug 18, 2016 at 2:44 PM, John Bickerstaff < > j...@johnbickerstaff.com> > wrote: > > > Hi all, > > > > TL:DR - > > Is it expected that the /select endpoint would produce different > > scores/result order between versions 5.4 and 6.1? > > > > > > (I'm aware that it's certainly possible I've done something different to > > these environments, although at this point I can't see any difference in > > configs etc... and I used a very simple search against /select to test > this) > > > > ====== Detail ========== > > > > I'm currently seeing different scoring and different result order when I > > compare Solr results in the Admin console for a 5.4 and 6.1 environment. > > > > I'm using the /select endpoint to try to avoid any difference in > > configuration. To the best of my knowledge (and reading) I haven't ever > > modified the xml for that endpoint. > > > > As I was looking into it, I saw that the debug output looks quite > > different in 6.1... > > > > Any advice, including "You must have broken it yourself, that's > > impossible" is much appreciated. > > > > > > > > Here's debug from the "old" 5.4 SolrCloud environment. The id's are a > > pain to read, but not only am I getting different scores, I'm getting > > different docs (or docs in a clearly different order) > > > > "debug": { "rawquerystring": "chiari", "querystring": "chiari", " > > parsedquery": "text:chiari", "parsedquery_toString": "text:chiari", " > > explain": { "d9644f86-5fe2-4a9f-8517-545e2cde0b64": "\n4.3581347 = > > weight(text:chiari in 26783) [ClassicSimilarity], result of:\n 4.3581347 > = > > fieldWeight in 26783, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > 1.0 > > = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 = > > fieldNorm(doc=26783)\n", "1347f707-6fdd-4864-b9dd-6d3e7cc32bf5": > "\n4.3581347 > > = weight(text:chiari in 26792) [ClassicSimilarity], result of:\n > 4.3581347 > > = fieldWeight in 26792, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n > 0.625 = > > fieldNorm(doc=26792)\n", "d01c32ad-e29d-4b65-9930-f8a6844a2613": > "\n4.3581347 > > = weight(text:chiari in 27028) [ClassicSimilarity], result of:\n > 4.3581347 > > = fieldWeight in 27028, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n > 0.625 = > > fieldNorm(doc=27028)\n", "0c5a4be7-1162-4b1a-ab83-4b48a690fc3a": > "\n4.3581347 > > = weight(text:chiari in 27029) [ClassicSimilarity], result of:\n > 4.3581347 > > = fieldWeight in 27029, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n > 0.625 = > > fieldNorm(doc=27029)\n", "e1cb441d-9d60-482d-956b-3fbc964a17c1": > "\n4.3581347 > > = weight(text:chiari in 27042) [ClassicSimilarity], result of:\n > 4.3581347 > > = fieldWeight in 27042, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n > 0.625 = > > fieldNorm(doc=27042)\n", "f87951f1-e163-4f17-a628-904b9df0c609": > "\n4.3581347 > > = weight(text:chiari in 27043) [ClassicSimilarity], result of:\n > 4.3581347 > > = fieldWeight in 27043, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n > 0.625 = > > fieldNorm(doc=27043)\n", "caaa7ca1-34cb-44a8-8dd9-12c909db8c2d": > "\n4.3581347 > > = weight(text:chiari in 27044) [ClassicSimilarity], result of:\n > 4.3581347 > > = fieldWeight in 27044, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n > 0.625 = > > fieldNorm(doc=27044)\n", "ada7a87e-725a-4533-b72e-3817af4c7179": > "\n4.3581347 > > = weight(text:chiari in 27055) [ClassicSimilarity], result of:\n > 4.3581347 > > = fieldWeight in 27055, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n > 0.625 = > > fieldNorm(doc=27055)\n", "ac6d47fd-9a59-47d6-8cfb-11b34c7ded54": > "\n4.3581347 > > = weight(text:chiari in 27056) [ClassicSimilarity], result of:\n > 4.3581347 > > = fieldWeight in 27056, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n > 0.625 = > > fieldNorm(doc=27056)\n", "4aaa7697-b26a-4bea-ba4e-70d18ea649f0": > "\n4.3581347 > > = weight(text:chiari in 62240) [ClassicSimilarity], result of:\n > 4.3581347 > > = fieldWeight in 62240, product of:\n 1.0 = tf(freq=1.0), with freq of:\n > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n > 0.625 = > > fieldNorm(doc=62240)\n" }, "QParser": "LuceneQParser", "timing": { > "time": > > 2, "prepare": { "time": 0, "query": { "time": 0 }, > > > > ... and here's the same from the Solr Cloud 6.0 environment > > > > "debug":{ "rawquerystring":"chiari", "querystring":"chiari", "parsedquery > > ":"text:chiari", "parsedquery_toString":"text:chiari", "explain":{ " > > 85249c23-ef68-4276-9ef7-48c290033993":"\n9.735645 = weight(text:chiari in > > 106960) [], result of:\n 9.735645 = score(doc=106960,freq=50.0 = > > termFreq=50.0\n), product of:\n 4.798444 = idf(docFreq=281, > > docCount=34151)\n 2.0289173 = tfNorm, computed from:\n 50.0 = > > termFreq=50.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 = > > avgFieldLength\n 4096.0 = fieldLength\n", "495b660d-8e8f-4b75-a523- > > 106440468818":"\n9.655164 = weight(text:chiari in 106215) [], result > > of:\n 9.655164 = score(doc=106215,freq=58.0 = termFreq=58.0\n), product > > of:\n 4.798444 = idf(docFreq=281, docCount=34151)\n 2.0121448 = tfNorm, > > computed from:\n 58.0 = termFreq=58.0\n 1.2 = parameter k1\n 0.75 = > > parameter b\n 941.3421 = avgFieldLength\n 5349.8774 = fieldLength\n", " > > 841df60a-b83e-4e74-9ad5-463971d5220a":"\n9.613188 = weight(text:chiari in > > 106214) [], result of:\n 9.613188 = score(doc=106214,freq=74.0 = > > termFreq=74.0\n), product of:\n 4.798444 = idf(docFreq=281, > > docCount=34151)\n 2.003397 = tfNorm, computed from:\n 74.0 = > > termFreq=74.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 = > > avgFieldLength\n 7281.778 = fieldLength\n", "0a8ab59f-95e3-4fca-adea- > > 5a62d97b4369":"\n9.594478 = weight(text:chiari in 106440) [], result > > of:\n 9.594478 = score(doc=106440,freq=54.0 = termFreq=54.0\n), product > > of:\n 4.798444 = idf(docFreq=281, docCount=34151)\n 1.9994978 = tfNorm, > > computed from:\n 54.0 = termFreq=54.0\n 1.2 = parameter k1\n 0.75 = > > parameter b\n 941.3421 = avgFieldLength\n 5349.8774 = fieldLength\n", " > > 15595a34-88c4-42e0-a6b2-9ee8eafdd9e8":"\n9.502294 = weight(text:chiari in > > 106958) [], result of:\n 9.502294 = score(doc=106958,freq=38.0 = > > termFreq=38.0\n), product of:\n 4.798444 = idf(docFreq=281, > > docCount=34151)\n 1.9802866 = tfNorm, computed from:\n 38.0 = > > termFreq=38.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 = > > avgFieldLength\n 4096.0 = fieldLength\n", "0acd1f88-395c-434d-9cba- > > 919e7073080c":"\n9.449741 = weight(text:chiari in 106439) [], result > > of:\n 9.449741 = score(doc=106439,freq=62.0 = termFreq=62.0\n), product > > of:\n 4.798444 = idf(docFreq=281, docCount=34151)\n 1.9693346 = tfNorm, > > computed from:\n 62.0 = termFreq=62.0\n 1.2 = parameter k1\n 0.75 = > > parameter b\n 941.3421 = avgFieldLength\n 7281.778 = fieldLength\n", " > > 66516297-cf1d-4ee8-847b-a5193420491a":"\n9.284438 = weight(text:chiari in > > 108786) [], result of:\n 9.284438 = score(doc=108786,freq=53.0 = > > termFreq=53.0\n), product of:\n 4.798444 = idf(docFreq=281, > > docCount=34151)\n 1.9348853 = tfNorm, computed from:\n 53.0 = > > termFreq=53.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 = > > avgFieldLength\n 7281.778 = fieldLength\n", "0c5a4be7-1162-4b1a-ab83- > > 4b48a690fc3a":"\n9.164393 = weight(text:chiari in 6100) [], result of:\n > > 9.164393 = score(doc=6100,freq=2.0 = termFreq=2.0\n), product of:\n > > 4.798444 = idf(docFreq=281, docCount=34151)\n 1.9098678 = tfNorm, > computed > > from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.75 = parameter b\n > > 941.3421 = avgFieldLength\n 4.0 = fieldLength\n", " > > e1cb441d-9d60-482d-956b-3fbc964a17c1":"\n9.164393 = weight(text:chiari in > > 6113) [], result of:\n 9.164393 = score(doc=6113,freq=2.0 = > > termFreq=2.0\n), product of:\n 4.798444 = idf(docFreq=281, > > docCount=34151)\n 1.9098678 = tfNorm, computed from:\n 2.0 = > termFreq=2.0\n > > 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 = avgFieldLength\n > 4.0 = > > fieldLength\n", "f87951f1-e163-4f17-a628-904b9df0c609":"\n9.164393 = > > weight(text:chiari in 6114) [], result of:\n 9.164393 = > > score(doc=6114,freq=2.0 = termFreq=2.0\n), product of:\n 4.798444 = > > idf(docFreq=281, docCount=34151)\n 1.9098678 = tfNorm, computed from:\n > 2.0 > > = termFreq=2.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 = > > avgFieldLength\n 4.0 = fieldLength\n"}, "QParser":"LuceneQParser", > "timing > > ":{ "time":1.0, > > >