Re: score calculation

Aloke Ghoshal Thu, 13 Dec 2012 03:45:38 -0800

Hi Tom,

This is great. Should make it to the documentations.


Regards,
Aloke

On Thu, Dec 13, 2012 at 1:23 PM, Burgmans, Tom <
tom.burgm...@wolterskluwer.com> wrote:

> I am also busy with getting this clear. Here are my notes so far (by
> copying and writing myself):
>
>
>
>     queryWeight = the impact of the query against the field
>         implementation: boost(query)*idf*queryNorm
>
>
>     boost(query) = boost of the field at query-time
>         Implication: hits in fields with higher boost get a higher score
>         Rationale: a term in field A could be more relevant than the same
> term in field B
>
>
>     idf = inverse document frequency = measure of how often the term
> appears across the index for this field
>         implementation: log(numDocs/(docFreq+1))+1
>         Implication: the greater the occurrence of a term in different
> documents, the lower its score
>         Rationale: common terms are less important than uncommon ones
>     numDocs = the total number of documents in the index, not including
> those that are marked as deleted but have not yet been purged. This is a
> constant (the same value for all documents in the index).
>     docFreq = the number of documents in the index which contain the term
> in this field. This is a constant (the same value for all documents in the
> index containing this field)
>
>
>     queryNorm = normalization factor so that queries can be compared
>         implementation: 1/sqrt(sumOfSquaredWeights)
>         Implication: doesn't impact the relevancy of this result
>         Rationale: queryNorm is not related to the relevance of the
> document, but rather tries to make scores between different queries
> comparable. This value is equal for all results of the query
>
>
>     fieldWeight = the score of a term matching the field
>         implementation: tf*idf*fieldNorm
>
>
>     tf = term frequency in a field = measure of how often a term appears
> in the field
>         implementation: sqrt(freq)
>         Implication: the more frequent a term occurs in a field, the
> greater its score
>         Rationale: fields which contains more of a term are generally more
> relevant
>     freq = termFreq = amount of times the term occurs in the field for
> this document
>
>
>     fieldNorm = impact of a hit in this field
>         implementation: lengthNorm*boost(index)
>     lengthNorm = measure of the importance of a term according to the
> total number of terms in the field
>         implementation: 1/sqrt(numTerms)
>         Implication: a term matched in fields with less terms have a
> higher score
>         Rationale: a term in a field with less terms is more important
> than one with more
>     numTerms = amount of terms in a field
>     boost (index) = boost of the field at index-time
>         Implication: hits in fields with higher boost get a higher score
>         Rationale: a term in field A could be more relevant than the same
> term in field B
>
>
>     maxDocs = the number of documents in the index, including those that
> are marked as deleted but have not yet been purged. This is a constant (the
> same value for all documents in the index)
>         Implication: (probably) doesn't play a role in the scoring
> calculation
>
>
>     coord = number of terms in the query that were found in the document
> (omitted if equal to 1)
>         implementation: overlap/maxOverlap
>         Implication: of the terms in the query, a document that contains
> more terms will have a higher score
>         Rationale: documents that match the most optional terms score
> highest
>     overlap = the number of query terms matched in the document
>     maxOverlap = the total number of terms in the query
>
>
>     FunctionQuery = could be any kind of custom ranking function, which
> outcome is added to, or multiplied with the default rank score.
>         Implication: various
>
>
> Look at the EXPLAIN information to see how the final score is calculated.
>
> Tom
>
>
> -----Original Message-----
> From: Sangeetha [mailto:sangeetha...@gmail.com]
> Sent: Thursday 13 December 2012 08:33
> To: solr-user@lucene.apache.org
> Subject: score calculation
>
>
> I want to know how score is calculated?
>
> what is fieldweight, fieldNorm, queryWeight and queryNorm. And what is the
> formula to get the final score using fieldweight, fieldNorm, queryWeight
> ,queryNorm, idf and tf.
>
> Can anyone explain or provide some links?
>
> Thanks,
> Sangeetha
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/score-calculation-tp4026669.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
> This email and any attachments may contain confidential or privileged
> information
> and is intended for the addressee only. If you are not the intended
> recipient, please
> immediately notify us by email or telephone and delete the original email
> and attachments
> without using, disseminating or reproducing its contents to anyone other
> than the intended
> recipient. Wolters Kluwer shall not be liable for the incorrect or
> incomplete transmission of
> of this email or any attachments, nor for unauthorized use by its
> employees.
>
> Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The
> Netherlands, and is registered
> with the Trade Registry of the Dutch Chamber of Commerce under number
> 33202517.
>

Re: score calculation

Reply via email to