Thanks for proving that this bug exists Ahmet! I couldn't for the life of me find a reproducible case.
-Rick ---------------------------------------- > Date: Wed, 23 Mar 2016 18:31:56 +0000 > From: iori...@yahoo.com > To: solr-user@lucene.apache.org; abenede...@apache.org; > r...@ricksullivan.net; r...@cebglobal.com > Subject: Re: Explain score is different from score > > Hi Rajesh, > > This is truly a Lucene level bug. I attached a failing Lucene test case with > the data you provided. > > Please see: https://issues.apache.org/jira/browse/SOLR-8884 and > reproduce with: ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData > -Dtests.seed=FEABDC2CA354130E > > > The ticked needs to be moved to Lucene. > > Thanks Rajesh for the data! > > P.S. I didn't finish solr test case after the Lucene failure. Currently it > just prints. It can be clearly seen that scores are different. > > <doc> > <int name="id">7614</int> > <str name="title_ws">Microsoft Office 365</str> > <float name="score">0.41190073</float> > <str name="[explain]">1.2357022 = product of ... </str> > </doc> > <doc> > <int name="id">7615</int> > <str name="title_ws">Microsoft Office 365 1.0</str> > <float name="score">0.41190073</float> > <str name="[explain]">1.2357022 = product of ... </str> > </doc> > > > Ahmet > > > On Wednesday, March 23, 2016 9:26 AM, "G, Rajesh" <r...@cebglobal.com> wrote: > > > > Hi Ahmet, > > I reproduced this issue again so thought the attached files would help. > Attached file has config[schema,solrconfig,data-imort...] , data I have > indexed and the result [debug.xml] > From the debug.xml I can see the difference in fl=score vs explain score. > maxScore is not assigned the real max[hope this is related to score diff > issue] > > Thanks > Rajesh > > > > Corporate Executive Board India Private Limited. Registration No: > U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building > No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. > > This e-mail and/or its attachments are intended only for the use of the > addressee(s) and may contain confidential and legally privileged information > belonging to CEB and/or its subsidiaries, including CEB subsidiaries that > offer SHL Talent Measurement products and services. If you have received this > e-mail in error, please notify the sender and immediately, destroy all copies > of this email and its attachments. The publication, copying, in whole or in > part, or use or dissemination in any other way of this e-mail and attachments > by anyone other than the intended person(s) is prohibited. > > -----Original Message----- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: Tuesday, March 22, 2016 9:35 PM > To: solr-user@lucene.apache.org; abenede...@apache.org; > r...@ricksullivan.net; G, Rajesh <r...@cebglobal.com> > Subject: Re: Explain score is different from score > > > > Hi all, > > May be it is better to move the discussion into a jira ticket. > I created SOLR-8884 for this. > > aHmet > > On Tuesday, March 22, 2016 1:59 PM, Alessandro Benedetti > <abenede...@apache.org> wrote: > > > > I got this problem re-ranking. > But in my short experience I was not able to reproduce nor fix the bug. > Can I ask you the query aprser used and all the components involved in the > query ? > > Cheers > > On Mon, Mar 21, 2016 at 8:40 PM, Rick Sullivan <r...@ricksullivan.net> > wrote: > >> I haven't checked this thread since Friday, but here are my responses >> to the questions that have come up. >> >> 1. How is ranking affected? >> >> Some documents have their scores divided by an integer value in the >> response documents. >> >> 2. Do you see the proper ranking in the explain section? >> >> Yes, the explain section always seems to have consistent values and >> proper rankings. >> >> 3. What about the results? >> >> No, these are ranked according to the sometimes incorrect score. >> >> 4. What version of Solr are you using? >> >> I've produced the problem on SolrCloud 5.5.0 (2 shards on 2 nodes on >> the same machine), Solr 5.5.0 (no sharding), and Solr 5.4.1 (no sharding). >> I've also had trouble reproducing the problem on test data. >> >> Thanks, >> -Rick >> >> ---------------------------------------- >>> Date: Mon, 21 Mar 2016 14:14:44 +0000 >>> From: iori...@yahoo.com.INVALID >>> To: solr-user@lucene.apache.org >>> Subject: Re: Explain score is different from score >>> >>> >>> >>> Hi Alessandro, >>> >>> OP have different ranking: fl=score and explain's score would have >> retrieve different orders. >>> I wrote test cases using ClassicSimilarity, but it won't re-produce. >>> This is really weird. I wonder what is triggering this. >>> >>> aHmet >>> >>> >>> On Monday, March 21, 2016 2:08 PM, Alessandro Benedetti < >> abenede...@apache.org> wrote: >>> >>> >>> >>> I would like to add a question, how the ranking is affected ? >>> Do you see the proper ranking in the explain section ? >>> And what about the results ? Are they ranked accordingly the correct >> score, >>> or they are ranked by the wrong score ? >>> I got a similar issue, which I am not able to reproduce yet, but it >>> was really really weird ( in my case I got also the ranking messed >>> up_ >>> >>> Cheers >>> >>> >>> On Mon, Mar 21, 2016 at 7:30 AM, G, Rajesh <r...@cebglobal.com> wrote: >>> >>>> Hi Ahmet, >>>> >>>> I am using solr 5.5.0. I am running single instance with single >>>> core. No shards >>>> >>>> I have added <similarity class="solr.BM25SimilarityFactory"/> to my >> schema >>>> as suggested by Rick Sullivan. Now the scores are same between >>>> explain >> and >>>> score field. >>>> >>>> But instead of previous results "Lync - Microsoft Office 365" and >>>> "Microsoft Office 365" I am getting >>>> >>>> { >>>> "title":"Office 365", >>>> "score":7.471676 >>>> }, >>>> { >>>> "title":"Office 365", >>>> "score":7.471676 >>>> }, >>>> >>>> If I try NGram title:(Microsoft Ofice 365) >>>> >>>> The scores are same for top 10 results even though they are >>>> differing by min of 3 characters. I have attached my schema.xml so >>>> it can help >>>> >>>> <doc> >>>> <str name="title">Lync - Microsoft Office 365</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365 1.0</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365 14.0</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365 14.3</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365 14.4</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365 14.5(Mac)</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365 15.0</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365 16.0</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365 4.0</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Office 365 E4</str> <float >>>> name="score">52.056263</float></doc> >>>> <doc> >>>> <str name="title">Microsoft Mail Protection Reports for Office 365 >>>> 15.0</str> <float name="score">50.215454</float></doc> >>>> >>>> Thanks >>>> Rajesh >>>> >>>> >>>> >>>> Corporate Executive Board India Private Limited. Registration No: >>>> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF >> Building >>>> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. >>>> >>>> This e-mail and/or its attachments are intended only for the use of >>>> the >>>> addressee(s) and may contain confidential and legally privileged >>>> information belonging to CEB and/or its subsidiaries, including CEB >>>> subsidiaries that offer SHL Talent Measurement products and >>>> services. If you have received this e-mail in error, please notify >>>> the sender and immediately, destroy all copies of this email and >>>> its attachments. The publication, copying, in whole or in part, or >>>> use or dissemination in >> any >>>> other way of this e-mail and attachments by anyone other than the >> intended >>>> person(s) is prohibited. >>>> >>>> -----Original Message----- >>>> From: Ahmet Arslan [mailto:iori...@yahoo.com] >>>> Sent: Sunday, March 20, 2016 2:10 AM >>>> To: solr-user@lucene.apache.org; G, Rajesh <r...@cebglobal.com>; >>>> r...@ricksullivan.net >>>> Subject: Re: Explain score is different from score >>>> >>>> Hi Rick and Rajesh, >>>> >>>> I wasn't able re-produce this neither with lucene nor solr. >>>> What version of solr is this? >>>> Are you using a sharded request? >>>> >>>> @BeforeClass >>>> public static void beforeClass() throws Exception { >>>> initCore("solrconfig.xml", "schema.xml"); >>>> >>>> assertU(adoc("id", "1722669", "title", "Lync - Microsoft Office >>>> 365")); assertU(adoc("id", "2043876", "title", "Microsoft Office >>>> 365")); >>>> >>>> assertU(commit()); >>>> >>>> } >>>> >>>> /** >>>> * Checks whether fl=score equals to Explain's score */ @Test public >>>> void >>>> testExplain() throws Exception { SolrQueryRequest req = >>>> req(CommonParams.DEBUG_QUERY, "true", "indent", "true", "q", >>>> "title:(Microsoft Ofice 365)", CommonParams.FL, "id,title,score"); >> String >>>> response = h.query(req); System.out.println(response); } >>>> >>>> @Test >>>> public void testExplain() throws Exception { >>>> >>>> Analyzer analyzer = new WhitespaceAnalyzer(); >>>> >>>> Directory directory = new RAMDirectory(); >>>> >>>> IndexWriterConfig config = new IndexWriterConfig(analyzer); >>>> config.setSimilarity(new ClassicSimilarity()); IndexWriter iwriter >>>> = new IndexWriter(directory, config); >>>> >>>> Document doc = new Document(); >>>> doc.add(new Field("id", "1722669", TextField.TYPE_STORED)); >>>> doc.add(new Field("title", "Lync - Microsoft Office 365", >>>> TextField.TYPE_STORED)); iwriter.addDocument(doc); >>>> >>>> doc = new Document(); >>>> doc.add(new Field("id", "2043876", TextField.TYPE_STORED)); >>>> doc.add(new Field("title", "Microsoft Office 365", >>>> TextField.TYPE_STORED)); iwriter.addDocument(doc); >>>> >>>> >>>> iwriter.close(); >>>> >>>> // Now search the index: >>>> DirectoryReader reader = DirectoryReader.open(directory); >>>> IndexSearcher searcher = new IndexSearcher(reader); >>>> searcher.setSimilarity(new ClassicSimilarity()); >>>> >>>> QueryParser parser = new QueryParser("title", analyzer); Query >>>> query = parser.parse("Microsoft Ofice 365"); ScoreDoc[] hits = >>>> searcher.search(query, 10).scoreDocs; >>>> >>>> Assert.assertEquals(2, hits.length); >>>> >>>> // Iterate through the results: >>>> for (int i = 0; i < hits.length; i++) { >>>> >>>> Document hitDoc = searcher.doc(hits[i].doc); Explanation >>>> explanation = searcher.explain(query, hits[i].doc); >>>> >>>> Assert.assertEquals("score from explain should equal to >> ScoreDoc.score!", >>>> hits[i].score, explanation.getValue(), 0.0); >>>> >>>> } >>>> >>>> >>>> reader.close(); >>>> directory.close(); >>>> >>>> } >>>> >>>> >>>> >>>> >>>> >>>> On Saturday, March 19, 2016 7:54 AM, "G, Rajesh" <r...@cebglobal.com> >> wrote: >>>> I don’t use boost at index time and query time. >>>> >>>> >>>> >>>> Corporate Executive Board India Private Limited. Registration No: >>>> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF >> Building >>>> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. >>>> >>>> This e-mail and/or its attachments are intended only for the use of >>>> the >>>> addressee(s) and may contain confidential and legally privileged >>>> information belonging to CEB and/or its subsidiaries, including CEB >>>> subsidiaries that offer SHL Talent Measurement products and >>>> services. If you have received this e-mail in error, please notify >>>> the sender and immediately, destroy all copies of this email and >>>> its attachments. The publication, copying, in whole or in part, or >>>> use or dissemination in >> any >>>> other way of this e-mail and attachments by anyone other than the >> intended >>>> person(s) is prohibited. >>>> >>>> >>>> -----Original Message----- >>>> From: Rick Sullivan [mailto:r...@ricksullivan.net] >>>> Sent: Friday, March 18, 2016 10:18 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: RE: Explain score is different from score >>>> >>>> I'm not. I only have query boosts. >>>> >>>> ---------------------------------------- >>>>> Date: Fri, 18 Mar 2016 16:42:36 +0000 >>>>> From: iori...@yahoo.com.INVALID >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Re: Explain score is different from score >>>>> >>>>> Hi Rick, >>>>> >>>>> This could be a bug I think. Do you guys use index time boosts? >>>>> >>>>> Ahmet >>>>> >>>>> >>>>> >>>>> On Friday, March 18, 2016 6:15 PM, Rick Sullivan < >> r...@ricksullivan.net> >>>> wrote: >>>>> Yes it seems to be something similar, but the normalization isn't >>>> applied to all retrieved documents, which messes with the document >> rankings. >>>>> >>>>> Some documents have the exact values from the 'explain' response, >>>>> while >>>> others are normalized. >>>>> >>>>> -Rick >>>>> >>>>> >>>>> ---------------------------------------- >>>>>> Date: Fri, 18 Mar 2016 16:06:19 +0000 >>>>>> From: iori...@yahoo.com.INVALID >>>>>> To: solr-user@lucene.apache.org >>>>>> Subject: Re: Explain score is different from score >>>>>> >>>>>> Hi Rajesh, >>>>>> >>>>>> I suspect it is due to the queryNorm(q). But it is weird that >>>>>> relative >>>> order is different in your example. >>>>>> >>>>>> >>>>>> "queryNorm(q) is a normalizing factor used to make scores between >>>>>> queries comparable. This factor does not affect document ranking >>>>>> (since all ranked documents are multiplied by the same factor), >>>>>> but rather just attempts to make scores from different queries >>>>>> (or even different indexes) comparable." [1] >>>>>> >>>>>> [1] >>>>>> https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/searc >>>>>> h/si >>>>>> milarities/TFIDFSimilarity.html >>>>>> >>>>>> Ahmet >>>>>> >>>>>> >>>>>> On Friday, March 18, 2016 4:24 PM, Rick Sullivan < >> r...@ricksullivan.net> >>>> wrote: >>>>>> Hi Rajesh, >>>>>> >>>>>> I've been seeing the same problem you have. My debug scores seem >>>>>> to be >>>> what I expect, but the actual scores applied by Solr are sometimes >> divided >>>> by an integer. >>>>>> >>>>>> I raised the same question in this email distribution about a >>>>>> week >> ago, >>>> but haven't yet found a solution. There's also a StackOverflow >>>> question >> I >>>> created here: >>>>>> http://stackoverflow.com/questions/35921106/how-and-why-do-solr-e >>>>>> xpla in-values-differ-from-the-solr-score >>>>>> >>>>>> Can you verify whether all of your affected scores are >>>>>> (1/N)*score? I think N seems to be the number of OR elements in >>>>>> the query. For example, your case below has >>>>>> >>>>>> debug_score/score >>>>>> = 1.2517526/0.41725087 >>>>>> = 3 >>>>>> >>>>>> Thanks, >>>>>> -Rick >>>>>> >>>>>> >>>>>> ---------------------------------------- >>>>>>> From: r...@cebglobal.com >>>>>>> To: solr-user@lucene.apache.org >>>>>>> Subject: RE: Explain score is different from score >>>>>>> Date: Fri, 18 Mar 2016 13:29:14 +0000 >>>>>>> >>>>>>> Can someone help? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Corporate Executive Board India Private Limited. Registration No: >>>> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF >> Building >>>> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. >>>>>>> >>>>>>> >>>>>>> >>>>>>> This e-mail and/or its attachments are intended only for the use >>>>>>> of >>>> the addressee(s) and may contain confidential and legally >>>> privileged information belonging to CEB and/or its subsidiaries, >>>> including CEB subsidiaries that offer SHL Talent Measurement >>>> products and services. If you have received this e-mail in error, >>>> please notify the sender and immediately, destroy all copies of >>>> this email and its attachments. The publication, copying, in whole >>>> or in part, or use or dissemination in >> any >>>> other way of this e-mail and attachments by anyone other than the >> intended >>>> person(s) is prohibited. >>>>>>> >>>>>>> >>>>>>> From: G, Rajesh >>>>>>> Sent: Friday, March 18, 2016 12:56 PM >>>>>>> To: solr-user@lucene.apache.org >>>>>>> Subject: Explain score is different from score >>>>>>> >>>>>>> Mismatch in score displayed in debug and score field. Please >>>>>>> refer >>>> attached xml. >>>>>>> >>>>>>> When I search for title_ws:(Microsoft Ofice 365). If the results >>>>>>> are >>>> displayed by explain score order then we would have the expected >>>> result “Microsoft Office 365” then “Lync - Microsoft Office 365” >>>>>>> >>>>>>> <result name="response" numFound="13617" start="0" >>>>>>> maxScore="1.0952835"> <doc> <str name="title">Lync - Microsoft >>>>>>> Office 365</str> <str name="title_ws">Lync - Microsoft Office >>>>>>> 365</str> <int name="id">1722669</int> <float >>>>>>> name="score">1.0952835</float></doc> Score from explain >>>>>>> 1.0952835 <doc> <str name="title">Microsoft Office 365</str> >>>>>>> <str name="title_ws">Microsoft Office 365</str> <int >>>>>>> name="id">2043876</int> <float >>>>>>> name="score">0.41725087</float></doc> >>>>>>> Score from explain 1.2517526 </result> >>>>>>> >>>>>>> Thanks >>>>>>> Rajesh >>>> >>> >>> >>> >>> -- >>> -------------------------- >>> >>> Benedetti Alessandro >>> Visiting card : http://about.me/alessandro_benedetti >>> >>> "Tyger, tyger burning bright >>> In the forests of the night, >>> What immortal hand or eye >>> Could frame thy fearful symmetry?" >>> >>> William Blake - Songs of Experience -1794 England > >> >> > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England