Re: Omitting tf but not positions

Robert Muir Fri, 25 Feb 2011 11:30:12 -0800

On Fri, Feb 25, 2011 at 1:57 PM, Jan Høydahl <jan....@cominvent.com> wrote:
> I also have a case (yellow-page) where IDF comes in and destroys the rank.
> A company listing with a word which occurs in few other listings is not 
> necessarily better than others just because of that. When it gets to the 
> extreme value of IDF=1, we get an artificially high IDF boost.
>
> It is not killed by omitNorms, neither by omitTermFrequencyAndPositions. Any 
> per-field way to get rid of the IDF effect?
> Or should I override idf() in Similarity?
>


Hi Jan, my reply was back in december. These days in lucene/solr
trunk, you can customize Similarity on a per-field basis.
So your yellow-page field can have a completely different similarity
(tf, idf, lengthnorm, etc).

For that field you can disable things like TF and IDF entirely, e.g.
just set it to a constant such as 1 or if you think thats too risky,
consider an alternative ranking scheme that doesn't use the IDF at all
such as the example in
https://issues.apache.org/jira/browse/LUCENE-2864

For now, you have to implement SimilarityProvider in a java class
(with something like a hashmap returning different similaritys for
different fields), and set this up with the similarity hook in
schema.xml, but there is an issue open to make this easier:
https://issues.apache.org/jira/browse/SOLR-2338

Re: Omitting tf but not positions

Reply via email to