Count me as interested.  Our "documents" are product descriptions, many
fields of which are very short.  Not sure if it would make large enough of
an impact to warrant us rolling our own solr build, but I'm definitely
interested to see the custom Similarity class.

Thanks,

Jason

On Thu, Aug 21, 2008 at 9:29 AM, Sean Timm <[EMAIL PROTECTED]> wrote:

> Length normalization in the Similarity class will generally favor shorter
> fields.  For example, with the DefaultSimilarity, the length norm for a 2
> term field is 0.625.  For a three term field it is 0.5.  The norm is
> multiplied by the score.
>
> I say "generally will favor" because the length norm value which is
> calculated as
>   (float)(1.0 / numTerms)
> is stored in the index as a single byte (instead of four bytes), thus
> losing precision.  This works fine for searching larger documents such as
> web pages or news articles, but it can cause some problems when you are
> simply searching on short fields such as product names or article titles.
>
> To solve this, we wrote our own Similarity class which extends
> DefaultSimilarity and maps numTerms 1-10 to precalculated values between
> 1.5f and 0.3125f.  For numTerms >10, we use the standard formula above.  If
> anyone else is interested in this, I can post the code as a patch in Jira.
>
> -Sean
>
>
> Simon Hu wrote:
>
>> Hi
>>
>> I have a text field named prodname in the solr index. Lets say there are 3
>> document in the index and  here are the field values for prodname field:
>>
>> Doc1: cordless drill
>> Doc2: cordless drill battery
>> Doc3: cordless drill charger
>> Searching for prodname:"cordless drill" will hit all three documents.  So
>> how can I make Doc1 score higher than the other two?
>> BTW, I am using solr1.2.
>> thanks!
>> -Simon
>>
>>
>


-- 
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/

Reply via email to