On Sat, Oct 31, 2009 at 8:48 AM, Paul Tomblin <ptomb...@xcski.com> wrote:
> Am I right in thinking that a document that the sortable field is only
> two sentences long and contains the search term once will score higher
> than one that is 50 sentences long that contains the search term 4
> times?

Yep.  Assuming 15 tokens per sentence, doc1 will have
lengthNorm = 1/(2*15)**.5 or 0.18 with  tf=1**.5 or 1
doc2 will have
lengthNorm  = 1/(50*15)**.5 or 0.04 with tf=4**.5 or 2

Or if you don't want length normalization at all, simply use
omitNorms=true in the schema for this field.

>  Is there a way to change it to score higher based only on
> number of hits?

Yes, simply use omitNorms=true in the schema.xml for this field.

If you still wanted a lengthNorm, you could change the balance by
creating a custom similarity and overriding either lengthNorm() or
tf()

-Yonik
http://www.lucidimagination.com

Reply via email to