Hi all,

I've got some puzzling issue here. During tests i noticed a document at the 
bottom of the results where it should not be. I query using DisMax on title 
and content field and have a boost on title using qf. Out of 30 results, only 
two documents also have the term in the title.

Using debugQuery and fl=*,score i quickly noticed large negative maxScore of 
the complete resultset and a portion of the resultset where scores sum up to 
zero because of a product with 0 (fieldNorm).

See below for debug output for a result with score = 0:

0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
    0.0 = (MATCH) weight(content:kunstgrasveld in 7), product of:
      0.75658196 = queryWeight(content:kunstgrasveld), product of:
        6.6516633 = idf(docFreq=33, maxDocs=9682)
        0.113743275 = queryNorm
      0.0 = (MATCH) fieldWeight(content:kunstgrasveld in 7), product of:
        2.236068 = tf(termFreq(content:kunstgrasveld)=5)
        6.6516633 = idf(docFreq=33, maxDocs=9682)
        0.0 = fieldNorm(field=content, doc=7)
    0.0 = (MATCH) fieldWeight(title:kunstgrasveld in 7), product of:
      1.0 = tf(termFreq(title:kunstgrasveld)=1)
      8.791729 = idf(docFreq=3, maxDocs=9682)
      0.0 = fieldNorm(field=title, doc=7)

And one with a negative score:

3.0716116E-4 = (MATCH) sum of:
  3.0716116E-4 = (MATCH) max of:
    3.0716116E-4 = (MATCH) weight(content:kunstgrasveld in 1462), product of:
      0.75658196 = queryWeight(content:kunstgrasveld), product of:
        6.6516633 = idf(docFreq=33, maxDocs=9682)
        0.113743275 = queryNorm
      4.059853E-4 = (MATCH) fieldWeight(content:kunstgrasveld in 1462), product 
of:
        1.0 = tf(termFreq(content:kunstgrasveld)=1)
        6.6516633 = idf(docFreq=33, maxDocs=9682)
        6.1035156E-5 = fieldNorm(field=content, doc=1462)

There are no funky issues with term analysis for the text fieldType, in fact, 
the term passes through unchanged. I don't do omitNorms, i store termVectors 
etc.

Because fieldNorm = fieldBoost / sqrt(numTermsForField) i suspect my input from 
Nutch is messed up. A fieldNorm can never be =< 0 for a normal positive boost 
and field boosts should not be zero or negative (correct me if i'm wrong). But, 
since i can't yet figure out what field boosts Nutch sends to me i thought i'd 
drop by on this mailing list first.

There are quite a few query terms that return with zero or negative scores and 
many that behave as i expect. I find it also a bit hard to comprehend why the 
docs with negative score rank higher in the result set than documents with 
zero score. Sorting defaults to score DESC,  but this is perhaps another 
issue.

Anyway, the test runs on a Solr 1.4.1 instance with Java 6 under the hood. 
Help or directions are appreciated =)

Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Reply via email to