Hi all,
May be it is better to move the discussion into a jira ticket. I created SOLR-8884 for this. aHmet On Tuesday, March 22, 2016 1:59 PM, Alessandro Benedetti <abenede...@apache.org> wrote: I got this problem re-ranking. But in my short experience I was not able to reproduce nor fix the bug. Can I ask you the query aprser used and all the components involved in the query ? Cheers On Mon, Mar 21, 2016 at 8:40 PM, Rick Sullivan <r...@ricksullivan.net> wrote: > I haven't checked this thread since Friday, but here are my responses to > the questions that have come up. > > 1. How is ranking affected? > > Some documents have their scores divided by an integer value in the > response documents. > > 2. Do you see the proper ranking in the explain section? > > Yes, the explain section always seems to have consistent values and proper > rankings. > > 3. What about the results? > > No, these are ranked according to the sometimes incorrect score. > > 4. What version of Solr are you using? > > I've produced the problem on SolrCloud 5.5.0 (2 shards on 2 nodes on the > same machine), Solr 5.5.0 (no sharding), and Solr 5.4.1 (no sharding). > I've also had trouble reproducing the problem on test data. > > Thanks, > -Rick > > ---------------------------------------- > > Date: Mon, 21 Mar 2016 14:14:44 +0000 > > From: iori...@yahoo.com.INVALID > > To: solr-user@lucene.apache.org > > Subject: Re: Explain score is different from score > > > > > > > > Hi Alessandro, > > > > OP have different ranking: fl=score and explain's score would have > retrieve different orders. > > I wrote test cases using ClassicSimilarity, but it won't re-produce. > > This is really weird. I wonder what is triggering this. > > > > aHmet > > > > > > On Monday, March 21, 2016 2:08 PM, Alessandro Benedetti < > abenede...@apache.org> wrote: > > > > > > > > I would like to add a question, how the ranking is affected ? > > Do you see the proper ranking in the explain section ? > > And what about the results ? Are they ranked accordingly the correct > score, > > or they are ranked by the wrong score ? > > I got a similar issue, which I am not able to reproduce yet, but it was > > really really weird ( in my case I got also the ranking messed up_ > > > > Cheers > > > > > > On Mon, Mar 21, 2016 at 7:30 AM, G, Rajesh <r...@cebglobal.com> wrote: > > > >> Hi Ahmet, > >> > >> I am using solr 5.5.0. I am running single instance with single core. No > >> shards > >> > >> I have added <similarity class="solr.BM25SimilarityFactory"/> to my > schema > >> as suggested by Rick Sullivan. Now the scores are same between explain > and > >> score field. > >> > >> But instead of previous results "Lync - Microsoft Office 365" and > >> "Microsoft Office 365" I am getting > >> > >> { > >> "title":"Office 365", > >> "score":7.471676 > >> }, > >> { > >> "title":"Office 365", > >> "score":7.471676 > >> }, > >> > >> If I try NGram title:(Microsoft Ofice 365) > >> > >> The scores are same for top 10 results even though they are differing by > >> min of 3 characters. I have attached my schema.xml so it can help > >> > >> <doc> > >> <str name="title">Lync - Microsoft Office 365</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365 1.0</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365 14.0</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365 14.3</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365 14.4</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365 14.5(Mac)</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365 15.0</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365 16.0</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365 4.0</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Office 365 E4</str> > >> <float name="score">52.056263</float></doc> > >> <doc> > >> <str name="title">Microsoft Mail Protection Reports for Office 365 > >> 15.0</str> > >> <float name="score">50.215454</float></doc> > >> > >> Thanks > >> Rajesh > >> > >> > >> > >> Corporate Executive Board India Private Limited. Registration No: > >> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF > Building > >> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. > >> > >> This e-mail and/or its attachments are intended only for the use of the > >> addressee(s) and may contain confidential and legally privileged > >> information belonging to CEB and/or its subsidiaries, including CEB > >> subsidiaries that offer SHL Talent Measurement products and services. If > >> you have received this e-mail in error, please notify the sender and > >> immediately, destroy all copies of this email and its attachments. The > >> publication, copying, in whole or in part, or use or dissemination in > any > >> other way of this e-mail and attachments by anyone other than the > intended > >> person(s) is prohibited. > >> > >> -----Original Message----- > >> From: Ahmet Arslan [mailto:iori...@yahoo.com] > >> Sent: Sunday, March 20, 2016 2:10 AM > >> To: solr-user@lucene.apache.org; G, Rajesh <r...@cebglobal.com>; > >> r...@ricksullivan.net > >> Subject: Re: Explain score is different from score > >> > >> Hi Rick and Rajesh, > >> > >> I wasn't able re-produce this neither with lucene nor solr. > >> What version of solr is this? > >> Are you using a sharded request? > >> > >> @BeforeClass > >> public static void beforeClass() throws Exception { > >> initCore("solrconfig.xml", "schema.xml"); > >> > >> assertU(adoc("id", "1722669", "title", "Lync - Microsoft Office 365")); > >> assertU(adoc("id", "2043876", "title", "Microsoft Office 365")); > >> > >> assertU(commit()); > >> > >> } > >> > >> /** > >> * Checks whether fl=score equals to Explain's score */ @Test public void > >> testExplain() throws Exception { SolrQueryRequest req = > >> req(CommonParams.DEBUG_QUERY, "true", "indent", "true", "q", > >> "title:(Microsoft Ofice 365)", CommonParams.FL, "id,title,score"); > String > >> response = h.query(req); System.out.println(response); } > >> > >> @Test > >> public void testExplain() throws Exception { > >> > >> Analyzer analyzer = new WhitespaceAnalyzer(); > >> > >> Directory directory = new RAMDirectory(); > >> > >> IndexWriterConfig config = new IndexWriterConfig(analyzer); > >> config.setSimilarity(new ClassicSimilarity()); IndexWriter iwriter = new > >> IndexWriter(directory, config); > >> > >> Document doc = new Document(); > >> doc.add(new Field("id", "1722669", TextField.TYPE_STORED)); doc.add(new > >> Field("title", "Lync - Microsoft Office 365", TextField.TYPE_STORED)); > >> iwriter.addDocument(doc); > >> > >> doc = new Document(); > >> doc.add(new Field("id", "2043876", TextField.TYPE_STORED)); doc.add(new > >> Field("title", "Microsoft Office 365", TextField.TYPE_STORED)); > >> iwriter.addDocument(doc); > >> > >> > >> iwriter.close(); > >> > >> // Now search the index: > >> DirectoryReader reader = DirectoryReader.open(directory); IndexSearcher > >> searcher = new IndexSearcher(reader); searcher.setSimilarity(new > >> ClassicSimilarity()); > >> > >> QueryParser parser = new QueryParser("title", analyzer); Query query = > >> parser.parse("Microsoft Ofice 365"); ScoreDoc[] hits = > >> searcher.search(query, 10).scoreDocs; > >> > >> Assert.assertEquals(2, hits.length); > >> > >> // Iterate through the results: > >> for (int i = 0; i < hits.length; i++) { > >> > >> Document hitDoc = searcher.doc(hits[i].doc); Explanation explanation = > >> searcher.explain(query, hits[i].doc); > >> > >> Assert.assertEquals("score from explain should equal to > ScoreDoc.score!", > >> hits[i].score, explanation.getValue(), 0.0); > >> > >> } > >> > >> > >> reader.close(); > >> directory.close(); > >> > >> } > >> > >> > >> > >> > >> > >> On Saturday, March 19, 2016 7:54 AM, "G, Rajesh" <r...@cebglobal.com> > wrote: > >> I don’t use boost at index time and query time. > >> > >> > >> > >> Corporate Executive Board India Private Limited. Registration No: > >> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF > Building > >> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. > >> > >> This e-mail and/or its attachments are intended only for the use of the > >> addressee(s) and may contain confidential and legally privileged > >> information belonging to CEB and/or its subsidiaries, including CEB > >> subsidiaries that offer SHL Talent Measurement products and services. If > >> you have received this e-mail in error, please notify the sender and > >> immediately, destroy all copies of this email and its attachments. The > >> publication, copying, in whole or in part, or use or dissemination in > any > >> other way of this e-mail and attachments by anyone other than the > intended > >> person(s) is prohibited. > >> > >> > >> -----Original Message----- > >> From: Rick Sullivan [mailto:r...@ricksullivan.net] > >> Sent: Friday, March 18, 2016 10:18 PM > >> To: solr-user@lucene.apache.org > >> Subject: RE: Explain score is different from score > >> > >> I'm not. I only have query boosts. > >> > >> ---------------------------------------- > >>> Date: Fri, 18 Mar 2016 16:42:36 +0000 > >>> From: iori...@yahoo.com.INVALID > >>> To: solr-user@lucene.apache.org > >>> Subject: Re: Explain score is different from score > >>> > >>> Hi Rick, > >>> > >>> This could be a bug I think. Do you guys use index time boosts? > >>> > >>> Ahmet > >>> > >>> > >>> > >>> On Friday, March 18, 2016 6:15 PM, Rick Sullivan < > r...@ricksullivan.net> > >> wrote: > >>> Yes it seems to be something similar, but the normalization isn't > >> applied to all retrieved documents, which messes with the document > rankings. > >>> > >>> Some documents have the exact values from the 'explain' response, while > >> others are normalized. > >>> > >>> -Rick > >>> > >>> > >>> ---------------------------------------- > >>>> Date: Fri, 18 Mar 2016 16:06:19 +0000 > >>>> From: iori...@yahoo.com.INVALID > >>>> To: solr-user@lucene.apache.org > >>>> Subject: Re: Explain score is different from score > >>>> > >>>> Hi Rajesh, > >>>> > >>>> I suspect it is due to the queryNorm(q). But it is weird that relative > >> order is different in your example. > >>>> > >>>> > >>>> "queryNorm(q) is a normalizing factor used to make scores between > >>>> queries comparable. This factor does not affect document ranking > >>>> (since all ranked documents are multiplied by the same factor), but > >>>> rather just attempts to make scores from different queries (or even > >>>> different indexes) comparable." [1] > >>>> > >>>> [1] > >>>> https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/search/si > >>>> milarities/TFIDFSimilarity.html > >>>> > >>>> Ahmet > >>>> > >>>> > >>>> On Friday, March 18, 2016 4:24 PM, Rick Sullivan < > r...@ricksullivan.net> > >> wrote: > >>>> Hi Rajesh, > >>>> > >>>> I've been seeing the same problem you have. My debug scores seem to be > >> what I expect, but the actual scores applied by Solr are sometimes > divided > >> by an integer. > >>>> > >>>> I raised the same question in this email distribution about a week > ago, > >> but haven't yet found a solution. There's also a StackOverflow question > I > >> created here: > >>>> http://stackoverflow.com/questions/35921106/how-and-why-do-solr-expla > >>>> in-values-differ-from-the-solr-score > >>>> > >>>> Can you verify whether all of your affected scores are (1/N)*score? I > >>>> think N seems to be the number of OR elements in the query. For > >>>> example, your case below has > >>>> > >>>> debug_score/score > >>>> = 1.2517526/0.41725087 > >>>> = 3 > >>>> > >>>> Thanks, > >>>> -Rick > >>>> > >>>> > >>>> ---------------------------------------- > >>>>> From: r...@cebglobal.com > >>>>> To: solr-user@lucene.apache.org > >>>>> Subject: RE: Explain score is different from score > >>>>> Date: Fri, 18 Mar 2016 13:29:14 +0000 > >>>>> > >>>>> Can someone help? > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Corporate Executive Board India Private Limited. Registration No: > >> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF > Building > >> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. > >>>>> > >>>>> > >>>>> > >>>>> This e-mail and/or its attachments are intended only for the use of > >> the addressee(s) and may contain confidential and legally privileged > >> information belonging to CEB and/or its subsidiaries, including CEB > >> subsidiaries that offer SHL Talent Measurement products and services. If > >> you have received this e-mail in error, please notify the sender and > >> immediately, destroy all copies of this email and its attachments. The > >> publication, copying, in whole or in part, or use or dissemination in > any > >> other way of this e-mail and attachments by anyone other than the > intended > >> person(s) is prohibited. > >>>>> > >>>>> > >>>>> From: G, Rajesh > >>>>> Sent: Friday, March 18, 2016 12:56 PM > >>>>> To: solr-user@lucene.apache.org > >>>>> Subject: Explain score is different from score > >>>>> > >>>>> Mismatch in score displayed in debug and score field. Please refer > >> attached xml. > >>>>> > >>>>> When I search for title_ws:(Microsoft Ofice 365). If the results are > >> displayed by explain score order then we would have the expected result > >> “Microsoft Office 365” then “Lync - Microsoft Office 365” > >>>>> > >>>>> <result name="response" numFound="13617" start="0" > >>>>> maxScore="1.0952835"> <doc> <str name="title">Lync - Microsoft > >>>>> Office 365</str> <str name="title_ws">Lync - Microsoft Office > >>>>> 365</str> <int name="id">1722669</int> <float > >>>>> name="score">1.0952835</float></doc> Score from explain 1.0952835 > >>>>> <doc> <str name="title">Microsoft Office 365</str> <str > >>>>> name="title_ws">Microsoft Office 365</str> <int > >>>>> name="id">2043876</int> <float name="score">0.41725087</float></doc> > >>>>> Score from explain 1.2517526 </result> > >>>>> > >>>>> Thanks > >>>>> Rajesh > >> > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England