Markus,

The calculation is correct.

Look at your output.

Result = queryWeight(text:gb) * fieldWeight(text:gb in 1)

Result = (idf(docFreq=6, numDocs=26) * queryNorm) *
(tf(termFreq(text:gb)=2) * idf(docFreq=6, numDocs=26) *
fieldNorm(field=text, doc=1))

This you should notice that idf(docFreq=6, numDocs=26 is repeated twice.

This si just how the weight() is calculated.




> > 0.18314168 = (MATCH) sum of:
> >   0.18314168 = (MATCH) weight(text:gb in 1), product of:
> >     0.35845062 = queryWeight(text:gb), product of:
> >       2.3121865 = idf(docFreq=6, numDocs=26)
> >       0.15502669 = queryNorm
> >    
> >     0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
> >       1.4142135 = tf(termFreq(text:gb)=2)
> >       2.3121865 = idf(docFreq=6, numDocs=26)
> >       0.15625 = fieldNorm(field=text, doc=1)





On 10/5/11 11:42 AM, "Markus Jelsma" <markus.jel...@openindex.io> wrote:

>Hi,
>
>I don't see 2.3121865 * 2 anywhere in your debug output or something that
>looks like that.
>
>
>> Hi Markus,
>> 
>> The idf calculation itself is correct.
>> What I am trying to understand here is  why idf value is multiplied
>>twice
>> in the final score calculation. Essentially,  tf x idf^2 is used instead
>> of tf x idf.
>> I'd like to understand the rational behind that.
>> 
>> On Wed, Oct 5, 2011 at 9:43 AM, Markus Jelsma
><markus.jel...@openindex.io>wrote:
>> > In Lucene's default similarity idf = 1 + ln (numDocs / df + 1).
>> > 1 + ln(26 / 7) =~ 2.3121865
>> > 
>> > I don't see a problem.
>> > 
>> > > Hi,
>> > > 
>> > > 
>> > > When I examine the score calculation of DisMax in Solr,   it looks
>>to
>> > > me that DisMax is using  tf x idf^2 instead of tf x idf.
>> > > Does anyone have insight why tf x idf is not used here?
>> > > 
>> > > Here is the score contribution from one one field:
>> > > 
>> > > score(q,c) =  queryWeight x fieldWeight
>> > > 
>> > >                = tf x idf x idf x queryNorm x fieldNorm
>> > > 
>> > > Here is the example that I used to derive the formula above.
>>Clearly,
>> > > idf is multiplied twice in the score calculation.
>> > > *
>> > 
>> > 
>>http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&inden
>> > t=
>> > 
>> > > on&debugQuery=true&fl=id,score *
>> > > 
>> > >     <str name="6H500F0">
>> > > 
>> > > 0.18314168 = (MATCH) sum of:
>> > >   0.18314168 = (MATCH) weight(text:gb in 1), product of:
>> > >     0.35845062 = queryWeight(text:gb), product of:
>> > >       2.3121865 = idf(docFreq=6, numDocs=26)
>> > >       0.15502669 = queryNorm
>> > >     
>> > >     0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
>> > >       1.4142135 = tf(termFreq(text:gb)=2)
>> > >       2.3121865 = idf(docFreq=6, numDocs=26)
>> > >       0.15625 = fieldNorm(field=text, doc=1)
>> > > 
>> > > </str>
>> > > 
>> > > 
>> > > Thanks!


Reply via email to