> -----Original Message-----
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Sunday, March 13, 2011 6:25 PM
> To: solr-user@lucene.apache.org; andy.ne...@gmail.com
> Subject: Re: Results driving me nuts!
> 
> 
> --- On Sun, 3/13/11, Andy Newby <andy.ne...@gmail.com> wrote:
> 
> > From: Andy Newby <andy.ne...@gmail.com>
> > Subject: Results driving me nuts!
> > To: solr-user@lucene.apache.org
> > Date: Sunday, March 13, 2011, 10:38 PM
> > Hi,
> >
> > Ok, I'm really really trying to get my head around this,
> > but I just can't :/
> >
> > Here are 2 example records, both using the query "st
> > patricks" to
> > search on (matches for the keywords are in **stars** like
> > so, to make
> > a point of what SHOULD be matching);
> >
> > keywords: animations mini alphabets **st** **patricks**
> > animated 1
> > clover  animations mini alphabets **st** **patricks**
> > description: animated 1 clover
> >
> > "124966":"
> > 209.23984 = (MATCH) product of:
> >   418.47968 = (MATCH) sum of:
> >     418.47968 = (MATCH) sum of:
> >       212.91336 = (MATCH) weight(keywords:st
> > in 5697), product of:
> >         0.41379675 =
> > queryWeight(keywords:st), product of:
> >           7.5798326 =
> > idf(docFreq=233, maxDocs=168578)
> >           0.05459181 = queryNorm
> >         514.5361 = (MATCH)
> > fieldWeight(keywords:st in 5697), product of:
> >           1.4142135 =
> > tf(termFreq(keywords:st)=2)
> >           7.5798326 =
> > idf(docFreq=233, maxDocs=168578)
> >           48.0 =
> > fieldNorm(field=keywords, doc=5697)
> >       205.56633 = (MATCH)
> > weight(keywords:patricks in 5697), product of:
> >         0.4065946 =
> > queryWeight(keywords:patricks), product of:
> >           7.447905 =
> > idf(docFreq=266, maxDocs=168578)
> >           0.05459181 = queryNorm
> >         505.58057 = (MATCH)
> > fieldWeight(keywords:patricks in 5697), product of:
> >           1.4142135 =
> > tf(termFreq(keywords:patricks)=2)
> >           7.447905 =
> > idf(docFreq=266, maxDocs=168578)
> >           48.0 =
> > fieldNorm(field=keywords, doc=5697)
> >   0.5 = coord(1/2)
> >
> > The other one:
> >
> > desc: a black and white mug of beer with a three leaf
> > clover in it
> > keywords: saint **patricks** day green irish
> > beer   spel132_bw clip
> > art holidays **st** **patricks** day
> > handle drink celebrate clip art holidays **st**
> > **patricks** day
> >
> > 5 matches
> >
> > "145351":"
> > 193.61652 = (MATCH) product of:
> >   387.23303 = (MATCH) sum of:
> >     387.23303 = (MATCH) sum of:
> >       177.4278 = (MATCH) weight(keywords:st
> > in 25380), product of:
> >         0.41379675 =
> > queryWeight(keywords:st), product of:
> >           7.5798326 =
> > idf(docFreq=233, maxDocs=168578)
> >           0.05459181 = queryNorm
> >         428.78006 = (MATCH)
> > fieldWeight(keywords:st in 25380), product of:
> >           1.4142135 =
> > tf(termFreq(keywords:st)=2)
> >           7.5798326 =
> > idf(docFreq=233, maxDocs=168578)
> >           40.0 =
> > fieldNorm(field=keywords, doc=25380)
> >       209.80525 = (MATCH)
> > weight(keywords:patricks in 25380), product of:
> >         0.4065946 =
> > queryWeight(keywords:patricks), product of:
> >           7.447905 =
> > idf(docFreq=266, maxDocs=168578)
> >           0.05459181 = queryNorm
> >         516.006 = (MATCH)
> > fieldWeight(keywords:patricks in 25380), product of:
> >           1.7320508 =
> > tf(termFreq(keywords:patricks)=3)
> >           7.447905 =
> > idf(docFreq=266, maxDocs=168578)
> >           40.0 =
> > fieldNorm(field=keywords, doc=25380)
> >   0.5 = coord(1/2)
> >
> >
> > Now the thing thats getting me, is the record which has 5
> > occurencs of
> > "st patricks" , is so different in terms of the scores it
> > gives!
> >
> > 209.23984
> > 193.61652
> >
> > (these should be the other way around)
> >
> > Can anyone try and explain whats going on with this?
> >
> > BTW, the queries are matched based on a normal "white
> > space" index,
> > nothing special.
> >
> > The actual query being used, is as follows:
> >
> > (keywords:"st" AND keywords:"patricks") OR
> > (description:"st" AND
> > description:"patricks")
> >
> > TIA - I'm hoping someone can save my sanity ;)
> 
> Their fieldNorm values are different. Norm consists of index time boost
> and length normalization.
> 
> http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/S
> imilarity.html#formula_norm
> 
> I can see that the one with 5 matches is longer than the other. Shorter
> documents are favored in solr/lucene with length normalization factor.
> 
> 
> 

Also the term frequency for patricks is different in each document

For 1st doc termFreq(keywords:st)=2 and for 2nd doc
termFreq(keywords:patricks)=3




Reply via email to