Constant tf with idf can work well for very short fields, like titles. For
example, the movie "New York, New York" is not twice as much about New York
as movies that have the string in the title only once.

wudner

-----Original Message-----
From: Aaron McKee [mailto:ucbmc...@gmail.com] 
Sent: Friday, September 18, 2009 8:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Disabling tf (term frequency) during indexing and/or scoring

Hi Yonik,

For my particular needs, IDF considerations are fine and helpful; if a 
user is requesting a rare term/phrase, increasing the score based on 
that makes sense as the match has higher confidence. I simply need to 
compensate for title and category type fields that may contain redundant 
information and disregard length considerations (these fields are 
multi-valued and may be populated from a varying number of sources, and 
I don't want the number of sources and the level of repetitiveness to 
affect the score). Basically, a boolean "does it match" score adjusted 
solely based on IDF. Of course, I'm sure there are others who probably 
wouldn't need or care about IDF, either, but still want phrase matching.

Cheers,
Aaron


Yonik Seeley wrote:
> On Fri, Sep 18, 2009 at 11:05 AM, Aaron McKee <ucbmc...@gmail.com> wrote:
>   
>> I wonder, though, if it could also make sense to support a
>> query-time only boolean to optionally disable TF independently, on a
>> per-field basis?
>>     
>
> I guess it could make sense.  But do you still want idf too? length
> norm? or do you really want a constant score (match/no-match)?
>
> -Yonik
> http://www.lucidimagination.com
>   


Reply via email to