Question about fieldNorm

Brendan Grainger Wed, 11 Jun 2008 10:10:35 -0700

Hi,

I've just changed the stemming algorithm slightly and am running a fewtests against the old stemmer versus the new stemmer. I did a queryfor 'hanger' and using the old stemmer I get the following scoring fora document with the title: Converter Hanger Assembly Replacement


6.4242806 = (MATCH) sum of:
  2.5697122 = (MATCH) max of:
    0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
      0.1963516 = queryWeight(markup_t:hanger), product of:
        6.5593724 = idf(docFreq=6375, numDocs=1655591)
        0.02993451 = queryNorm

1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),product of:

        1.7320508 = tf(termFreq(markup_t:hanger)=3)
        6.5593724 = idf(docFreq=6375, numDocs=1655591)
        0.109375 = fieldNorm(field=markup_t, doc=3454)
    2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
      0.5547002 = queryWeight(title_t:hanger^2.0), product of:
        2.0 = boost
        9.265229 = idf(docFreq=425, numDocs=1655591)
        0.02993451 = queryNorm

4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),product of:

        1.0 = tf(termFreq(title_t:hanger)=1)
        9.265229 = idf(docFreq=425, numDocs=1655591)
        0.5 = fieldNorm(field=title_t, doc=3454)
  3.8545685 = (MATCH) max of:

0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), productof:

      0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
        0.5 = boost
        6.5593724 = idf(docFreq=6375, numDocs=1655591)
        0.02993451 = queryNorm

1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),product of:

        1.7320508 = tf(termFreq(markup_t:hanger)=3)
        6.5593724 = idf(docFreq=6375, numDocs=1655591)
        0.109375 = fieldNorm(field=markup_t, doc=3454)
    3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
      0.8320503 = queryWeight(title_t:hanger^3.0), product of:
        3.0 = boost
        9.265229 = idf(docFreq=425, numDocs=1655591)
        0.02993451 = queryNorm

4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),product of:

        1.0 = tf(termFreq(title_t:hanger)=1)
        9.265229 = idf(docFreq=425, numDocs=1655591)
        0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
  2.248498 = (MATCH) max of:
    0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
      0.19635157 = queryWeight(markup_t:hanger), product of:
        6.559371 = idf(docFreq=6375, numDocs=1655589)
        0.029934512 = queryNorm

1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),product of:

        1.7320508 = tf(termFreq(markup_t:hanger)=3)
        6.559371 = idf(docFreq=6375, numDocs=1655589)
        0.109375 = fieldNorm(field=markup_t, doc=3454)
    2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
      0.5547002 = queryWeight(title_t:hanger^2.0), product of:
        2.0 = boost
        9.265228 = idf(docFreq=425, numDocs=1655589)
        0.029934512 = queryNorm

4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),product of:

        1.0 = tf(termFreq(title_t:hanger)=1)
        9.265228 = idf(docFreq=425, numDocs=1655589)
        0.4375 = fieldNorm(field=title_t, doc=3454)
  3.372747 = (MATCH) max of:

0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), productof:

      0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
        0.5 = boost
        6.559371 = idf(docFreq=6375, numDocs=1655589)
        0.029934512 = queryNorm

1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),product of:

        1.7320508 = tf(termFreq(markup_t:hanger)=3)
        6.559371 = idf(docFreq=6375, numDocs=1655589)
        0.109375 = fieldNorm(field=markup_t, doc=3454)
    3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
      0.83205026 = queryWeight(title_t:hanger^3.0), product of:
        3.0 = boost
        9.265228 = idf(docFreq=425, numDocs=1655589)
        0.029934512 = queryNorm

4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),product of:

        1.0 = tf(termFreq(title_t:hanger)=1)
        9.265228 = idf(docFreq=425, numDocs=1655589)
        0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_tfield is different in each of the explanations, ie: the fieldNormusing the old stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454).For the new stemmer 0.4375 = fieldNorm(field=title_t, doc=3454). Iran the title through both stemmers and get the same number of tokensproduced. I do no index time boosting on the title_t field. I am usingDefaultSimilarity in both instances. So I figured the calculatedfieldNorm would be:


field boost * lengthNorm = 1 * 1/sqrt(4) = 0.5

I wouldn't have thought that changing the stemmer would have anyimpact on the fieldNorm in this case. Any insight? Please kick me overto the lucene list if you feel this isn't appropriate here.


Regards
Brendan

Question about fieldNorm

Reply via email to