https://issues.apache.org/jira/browse/LUCENE-1360

Simon Hu wrote:
I am definitely interested in trying your Similarity class. Can you please
post the patch in jira?

thanks
-Simon



Sean Timm wrote:
In the example below, Doc1, and Doc2 will all have the same score for the query "chevrolet tahoe." We would prefer Doc2 to score higher than Doc1. The score length norm for each is also 0.5f. I presume which one appears first now falls to the order they were placed in the index? By using our score length norm function, Doc2's score will be multiplied by 1.0f and Doc1 by 0.875f resulting in the desired behavior.

Doc1: Chevrolet Tahoe Hybrid 2008
Doc2: Chevrolet Tahoe 2008

-Sean

Mark Miller wrote:
Sean Timm wrote:
To solve this, we wrote our own Similarity class which extends DefaultSimilarity and maps numTerms 1-10 to precalculated values between 1.5f and 0.3125f. For numTerms >10, we use the standard formula above. If anyone else is interested in this, I can post the code as a patch in Jira.

Does this actually have a good measurable affect for you? Wouldn't it make more sense to just turn off norms for short fields?

Reply via email to