On Sat, Oct 31, 2009 at 8:48 AM, Paul Tomblin <ptomb...@xcski.com> wrote: > Am I right in thinking that a document that the sortable field is only > two sentences long and contains the search term once will score higher > than one that is 50 sentences long that contains the search term 4 > times?
Yep. Assuming 15 tokens per sentence, doc1 will have lengthNorm = 1/(2*15)**.5 or 0.18 with tf=1**.5 or 1 doc2 will have lengthNorm = 1/(50*15)**.5 or 0.04 with tf=4**.5 or 2 Or if you don't want length normalization at all, simply use omitNorms=true in the schema for this field. > Is there a way to change it to score higher based only on > number of hits? Yes, simply use omitNorms=true in the schema.xml for this field. If you still wanted a lengthNorm, you could change the balance by creating a custom similarity and overriding either lengthNorm() or tf() -Yonik http://www.lucidimagination.com