Constant tf with idf can work well for very short fields, like titles. For example, the movie "New York, New York" is not twice as much about New York as movies that have the string in the title only once.
wudner -----Original Message----- From: Aaron McKee [mailto:ucbmc...@gmail.com] Sent: Friday, September 18, 2009 8:33 AM To: solr-user@lucene.apache.org Subject: Re: Disabling tf (term frequency) during indexing and/or scoring Hi Yonik, For my particular needs, IDF considerations are fine and helpful; if a user is requesting a rare term/phrase, increasing the score based on that makes sense as the match has higher confidence. I simply need to compensate for title and category type fields that may contain redundant information and disregard length considerations (these fields are multi-valued and may be populated from a varying number of sources, and I don't want the number of sources and the level of repetitiveness to affect the score). Basically, a boolean "does it match" score adjusted solely based on IDF. Of course, I'm sure there are others who probably wouldn't need or care about IDF, either, but still want phrase matching. Cheers, Aaron Yonik Seeley wrote: > On Fri, Sep 18, 2009 at 11:05 AM, Aaron McKee <ucbmc...@gmail.com> wrote: > >> I wonder, though, if it could also make sense to support a >> query-time only boolean to optionally disable TF independently, on a >> per-field basis? >> > > I guess it could make sense. But do you still want idf too? length > norm? or do you really want a constant score (match/no-match)? > > -Yonik > http://www.lucidimagination.com >