https://issues.apache.org/jira/browse/LUCENE-1360
Simon Hu wrote:
I am definitely interested in trying your Similarity class. Can you please
post the patch in jira?
thanks
-Simon
Sean Timm wrote:
In the example below, Doc1, and Doc2 will all have the same score for
the query "chevrolet tahoe." We would prefer Doc2 to score higher than
Doc1. The score length norm for each is also 0.5f. I presume which one
appears first now falls to the order they were placed in the index? By
using our score length norm function, Doc2's score will be multiplied by
1.0f and Doc1 by 0.875f resulting in the desired behavior.
Doc1: Chevrolet Tahoe Hybrid 2008
Doc2: Chevrolet Tahoe 2008
-Sean
Mark Miller wrote:
Sean Timm wrote:
To solve this, we wrote our own Similarity class which extends
DefaultSimilarity and maps numTerms 1-10 to precalculated values
between 1.5f and 0.3125f. For numTerms >10, we use the standard
formula above. If anyone else is interested in this, I can post the
code as a patch in Jira.
Does this actually have a good measurable affect for you? Wouldn't it
make more sense to just turn off norms for short fields?