Factor 1: idf
  If you do a search on "blue whales" you are probably much more
interested in whales than you are in things that are blue.  The idf
factor takes this term rarity into account.  In your case, color:blue
appears in over 9000 documents, but productNameSearch:blue only
appears in 120 documents (and thus it's idf factor is much higher).
One option is to simply boost searches on your color field higher.

Factor 2: length normalization
  0.625 = fieldNorm(field=productNameSearch, doc=8142)
The second document probably has a match in a longer field, which is a
less specific match and thus gets penalized. Because this is in the
very important field (as measured by idf) this causes the second doc
to lose.

Factor 3: No coord factor in the top level boolean query in generated
dismax queries.  This would generally cause matches in more fields to
be boosted beyond just adding their scores together.   Maybe we should
have an option for this.

-Yonik
http://www.lucidimagination.com



On Wed, Sep 9, 2009 at 6:00 PM, Jeff Newburn <jnewb...@zappos.com> wrote:
> I have done a search on the word ³blue² in our index.  The debugQuery shows
> some extremely strange methods of scoring.  Somehow product 1 gets a higher
> score with only 1 match on the word blue when product 2 gets a lower score
> with the same field match AND an additional field match.  Can someone please
> help me understand why such an obviously more relevant product is given a
> lower score.
>
>  <str name="954058">
> 2.3623571 = (MATCH) sum of:
>  0.26248413 = (MATCH) max plus 0.5 times others of:
>    0.26248413 = (MATCH) weight(productNameSearch:blue in 112779), product
> of:
>      0.032673787 = queryWeight(productNameSearch:blue), product of:
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        1.0 = fieldNorm(field=productNameSearch, doc=112779)
>  2.099873 = (MATCH) max plus 0.5 times others of:
>    2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779), product
> of:
>      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>        8.0 = boost
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        1.0 = fieldNorm(field=productNameSearch, doc=112779)
> </str>
>  <str name="402943">
> 1.9483687 = (MATCH) sum of:
>  0.63594794 = (MATCH) max plus 0.5 times others of:
>    0.16405259 = (MATCH) weight(productNameSearch:blue in 8142), product of:
>      0.032673787 = queryWeight(productNameSearch:blue), product of:
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.625 = fieldNorm(field=productNameSearch, doc=8142)
>    0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
>      0.15009704 = queryWeight(color:blue^10.0), product of:
>        10.0 = boost
>        3.6904235 = idf(docFreq=9309, numDocs=136731)
>        0.0040672035 = queryNorm
>      3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
>        1.0 = tf(termFreq(color:blue)=1)
>        3.6904235 = idf(docFreq=9309, numDocs=136731)
>        1.0 = fieldNorm(field=color, doc=8142)
>  1.3124207 = (MATCH) max plus 0.5 times others of:
>    1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142), product
> of:
>      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>        8.0 = boost
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.625 = fieldNorm(field=productNameSearch, doc=8142)
> </str>
>
> --
> Jeff Newburn
> Software Engineer, Zappos.com
> jnewb...@zappos.com - 702-943-7562
>
>

Reply via email to