> -----Original Message----- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: Sunday, March 13, 2011 6:25 PM > To: solr-user@lucene.apache.org; andy.ne...@gmail.com > Subject: Re: Results driving me nuts! > > > --- On Sun, 3/13/11, Andy Newby <andy.ne...@gmail.com> wrote: > > > From: Andy Newby <andy.ne...@gmail.com> > > Subject: Results driving me nuts! > > To: solr-user@lucene.apache.org > > Date: Sunday, March 13, 2011, 10:38 PM > > Hi, > > > > Ok, I'm really really trying to get my head around this, > > but I just can't :/ > > > > Here are 2 example records, both using the query "st > > patricks" to > > search on (matches for the keywords are in **stars** like > > so, to make > > a point of what SHOULD be matching); > > > > keywords: animations mini alphabets **st** **patricks** > > animated 1 > > clover animations mini alphabets **st** **patricks** > > description: animated 1 clover > > > > "124966":" > > 209.23984 = (MATCH) product of: > > 418.47968 = (MATCH) sum of: > > 418.47968 = (MATCH) sum of: > > 212.91336 = (MATCH) weight(keywords:st > > in 5697), product of: > > 0.41379675 = > > queryWeight(keywords:st), product of: > > 7.5798326 = > > idf(docFreq=233, maxDocs=168578) > > 0.05459181 = queryNorm > > 514.5361 = (MATCH) > > fieldWeight(keywords:st in 5697), product of: > > 1.4142135 = > > tf(termFreq(keywords:st)=2) > > 7.5798326 = > > idf(docFreq=233, maxDocs=168578) > > 48.0 = > > fieldNorm(field=keywords, doc=5697) > > 205.56633 = (MATCH) > > weight(keywords:patricks in 5697), product of: > > 0.4065946 = > > queryWeight(keywords:patricks), product of: > > 7.447905 = > > idf(docFreq=266, maxDocs=168578) > > 0.05459181 = queryNorm > > 505.58057 = (MATCH) > > fieldWeight(keywords:patricks in 5697), product of: > > 1.4142135 = > > tf(termFreq(keywords:patricks)=2) > > 7.447905 = > > idf(docFreq=266, maxDocs=168578) > > 48.0 = > > fieldNorm(field=keywords, doc=5697) > > 0.5 = coord(1/2) > > > > The other one: > > > > desc: a black and white mug of beer with a three leaf > > clover in it > > keywords: saint **patricks** day green irish > > beer spel132_bw clip > > art holidays **st** **patricks** day > > handle drink celebrate clip art holidays **st** > > **patricks** day > > > > 5 matches > > > > "145351":" > > 193.61652 = (MATCH) product of: > > 387.23303 = (MATCH) sum of: > > 387.23303 = (MATCH) sum of: > > 177.4278 = (MATCH) weight(keywords:st > > in 25380), product of: > > 0.41379675 = > > queryWeight(keywords:st), product of: > > 7.5798326 = > > idf(docFreq=233, maxDocs=168578) > > 0.05459181 = queryNorm > > 428.78006 = (MATCH) > > fieldWeight(keywords:st in 25380), product of: > > 1.4142135 = > > tf(termFreq(keywords:st)=2) > > 7.5798326 = > > idf(docFreq=233, maxDocs=168578) > > 40.0 = > > fieldNorm(field=keywords, doc=25380) > > 209.80525 = (MATCH) > > weight(keywords:patricks in 25380), product of: > > 0.4065946 = > > queryWeight(keywords:patricks), product of: > > 7.447905 = > > idf(docFreq=266, maxDocs=168578) > > 0.05459181 = queryNorm > > 516.006 = (MATCH) > > fieldWeight(keywords:patricks in 25380), product of: > > 1.7320508 = > > tf(termFreq(keywords:patricks)=3) > > 7.447905 = > > idf(docFreq=266, maxDocs=168578) > > 40.0 = > > fieldNorm(field=keywords, doc=25380) > > 0.5 = coord(1/2) > > > > > > Now the thing thats getting me, is the record which has 5 > > occurencs of > > "st patricks" , is so different in terms of the scores it > > gives! > > > > 209.23984 > > 193.61652 > > > > (these should be the other way around) > > > > Can anyone try and explain whats going on with this? > > > > BTW, the queries are matched based on a normal "white > > space" index, > > nothing special. > > > > The actual query being used, is as follows: > > > > (keywords:"st" AND keywords:"patricks") OR > > (description:"st" AND > > description:"patricks") > > > > TIA - I'm hoping someone can save my sanity ;) > > Their fieldNorm values are different. Norm consists of index time boost > and length normalization. > > http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/S > imilarity.html#formula_norm > > I can see that the one with 5 matches is longer than the other. Shorter > documents are favored in solr/lucene with length normalization factor. > > >
Also the term frequency for patricks is different in each document For 1st doc termFreq(keywords:st)=2 and for 2nd doc termFreq(keywords:patricks)=3