Length normalization in the Similarity class will generally favor shorter fields. For example, with the DefaultSimilarity, the length norm for a 2 term field is 0.625. For a three term field it is 0.5. The norm is multiplied by the score.

I say "generally will favor" because the length norm value which is calculated as
   (float)(1.0 / numTerms)
is stored in the index as a single byte (instead of four bytes), thus losing precision. This works fine for searching larger documents such as web pages or news articles, but it can cause some problems when you are simply searching on short fields such as product names or article titles.

To solve this, we wrote our own Similarity class which extends DefaultSimilarity and maps numTerms 1-10 to precalculated values between 1.5f and 0.3125f. For numTerms >10, we use the standard formula above. If anyone else is interested in this, I can post the code as a patch in Jira.

-Sean

Simon Hu wrote:
Hi

I have a text field named prodname in the solr index. Lets say there are 3
document in the index and  here are the field values for prodname field:

Doc1: cordless drill
Doc2: cordless drill battery
Doc3: cordless drill charger
Searching for prodname:"cordless drill" will hit all three documents.  So
how can I make Doc1 score higher than the other two? BTW, I am using solr1.2. thanks! -Simon

Reply via email to