Re: Question about fieldNorm

Brendan Grainger Wed, 11 Jun 2008 15:51:14 -0700

Thanks so much, that explains it.

Brendan


On Jun 11, 2008, at 4:00 PM, Yonik Seeley wrote:

Field norms have limited precision (it's encoded as an 8 bit float) so
you are probably seeing rounding.

-Yonik

On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger
<[EMAIL PROTECTED]> wrote:

Hi Yonik,

I just realized that the stemmer does make a difference because ofsynonyms.So on indexing using the new stemmer "converter hanger assemblyreplacement"gets expanded to: "converter hanger assembly assemble replacement"so thereare 5 terms which gets a length norm of 0.4472136 instead of 0.5.Stillunsure how it gets 0.4375 though as the result for the field normthough

unless I have a boost of 0.9783 somewhere there.

Brendan


On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:

That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
<[EMAIL PROTECTED]> wrote:

I've just changed the stemming algorithm slightly and am runninga few

tests
against the old stemmer versus the new stemmer. I did a query for
'hanger'

and using the old stemmer I get the following scoring for adocument with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.1963516 = queryWeight(markup_t:hanger), product of:
    6.5593724 = idf(docFreq=6375, numDocs=1655591)
    0.02993451 = queryNorm

1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),product of:

    1.7320508 = tf(termFreq(markup_t:hanger)=3)
    6.5593724 = idf(docFreq=6375, numDocs=1655591)
    0.109375 = fieldNorm(field=markup_t, doc=3454)
2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
    2.0 = boost
    9.265229 = idf(docFreq=425, numDocs=1655591)
    0.02993451 = queryNorm

4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),product of:

    1.0 = tf(termFreq(title_t:hanger)=1)
    9.265229 = idf(docFreq=425, numDocs=1655591)
    0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:

0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), productof:

  0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
    0.5 = boost
    6.5593724 = idf(docFreq=6375, numDocs=1655591)
    0.02993451 = queryNorm

1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),product of:

    1.7320508 = tf(termFreq(markup_t:hanger)=3)
    6.5593724 = idf(docFreq=6375, numDocs=1655591)
    0.109375 = fieldNorm(field=markup_t, doc=3454)
3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.8320503 = queryWeight(title_t:hanger^3.0), product of:
    3.0 = boost
    9.265229 = idf(docFreq=425, numDocs=1655591)
    0.02993451 = queryNorm

4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),product of:

    1.0 = tf(termFreq(title_t:hanger)=1)
    9.265229 = idf(docFreq=425, numDocs=1655591)
    0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.19635157 = queryWeight(markup_t:hanger), product of:
    6.559371 = idf(docFreq=6375, numDocs=1655589)
    0.029934512 = queryNorm

1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),product of:

    1.7320508 = tf(termFreq(markup_t:hanger)=3)
    6.559371 = idf(docFreq=6375, numDocs=1655589)
    0.109375 = fieldNorm(field=markup_t, doc=3454)
2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
    2.0 = boost
    9.265228 = idf(docFreq=425, numDocs=1655589)
    0.029934512 = queryNorm

4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),product of:

    1.0 = tf(termFreq(title_t:hanger)=1)
    9.265228 = idf(docFreq=425, numDocs=1655589)
    0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:

0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), productof:

  0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
    0.5 = boost
    6.559371 = idf(docFreq=6375, numDocs=1655589)
    0.029934512 = queryNorm

1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),product of:

    1.7320508 = tf(termFreq(markup_t:hanger)=3)
    6.559371 = idf(docFreq=6375, numDocs=1655589)
    0.109375 = fieldNorm(field=markup_t, doc=3454)
3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.83205026 = queryWeight(title_t:hanger^3.0), product of:
    3.0 = boost
    9.265228 = idf(docFreq=425, numDocs=1655589)
    0.029934512 = queryNorm

4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),product of:

    1.0 = tf(termFreq(title_t:hanger)=1)
    9.265228 = idf(docFreq=425, numDocs=1655589)
    0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for thetitle_t field

is

different in each of the explanations, ie: the fieldNorm usingthe oldstemmer is: 0.5 = fieldNorm(field=title_t, doc=3454). For the newstemmer0.4375 = fieldNorm(field=title_t, doc=3454). I ran the titlethrough bothstemmers and get the same number of tokens produced. I do noindex time

boosting on the title_t field. I am using DefaultSimilarity in both
instances. So I figured the calculated fieldNorm would be:

field boost * lengthNorm = 1 * 1/sqrt(4) = 0.5

I wouldn't have thought that changing the stemmer would have anyimpact

on
the fieldNorm in this case. Any insight? Please kick me over to the
lucene
list if you feel this isn't appropriate here.

Regards
Brendan

Re: Question about fieldNorm

Reply via email to