OK. Well, still, the fact that the score increases almost 20% because of just one extra term in the field, is not really reasonable if you ask me. But you seem to say that this is expected, reasonable and wanted behavior for most use case?
I'm not sure that I feel comfortable replacing the default Similarity implementation with a custom one. That would just increase the complexity of our setup and would make future upgrades harder (we would for example have to remember to check if the default similarity configuration or implementation changes). No, if it really is the case that most people like and want this, and there is no way to configure Solr/Lucene to calculate fieldNorm in a more reasonable way (in my book) for short field values, then I just think we are forced to set omitNorms="true", maybe in combination with a simple field boost for shorter fields. /Jimi -----Original Message----- From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: Wednesday, April 20, 2016 5:18 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to configure a minimum field length for the fieldNorm value? FWIW, length for normalization is measured in terms (tokens), not characters. With TDIFS similarity (the default before 6.0), the normalization is based on the inverse square root of the number of terms in the field: return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms))); That code is in ClassicSimilarity: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/ClassicSimilarity.java#L115 You can always write your own custom Similarity class to override that calculation. -- Jack Krupansky On Wed, Apr 20, 2016 at 10:43 AM, <jimi.hulleg...@svensktnaringsliv.se> wrote: > Hi, > > In general I think that the fieldNorm factor in the score calculation > is quite good. But when the text is short I think that the effect is two big. > > Ie with two documents that have a short text in the same field, just a > few characters extra in of the documents lower the fieldNorm factor too much. > In one test the text in document 1 is 30 characters long and has > fieldNorm 0.4375, and in document 2 the text is 37 characters long and > has fieldNorm 0.375. That means that the first document gets almost a > 20% higher score simply because of the 7 character difference. > > What are my options if I want to change this behavior? Can I set a > lower character limit, meaning that all fields with a length below > this limit gets the same fieldNorm value? > > I know I can force fieldNorm to be 1 by setting omitNorms="true" for > that field, but I would prefer to still have it, just limit its effect > on short texts. > > Regards > /Jimi > > >