On Fri, Feb 25, 2011 at 1:57 PM, Jan Høydahl <jan....@cominvent.com> wrote: > I also have a case (yellow-page) where IDF comes in and destroys the rank. > A company listing with a word which occurs in few other listings is not > necessarily better than others just because of that. When it gets to the > extreme value of IDF=1, we get an artificially high IDF boost. > > It is not killed by omitNorms, neither by omitTermFrequencyAndPositions. Any > per-field way to get rid of the IDF effect? > Or should I override idf() in Similarity? >
Hi Jan, my reply was back in december. These days in lucene/solr trunk, you can customize Similarity on a per-field basis. So your yellow-page field can have a completely different similarity (tf, idf, lengthnorm, etc). For that field you can disable things like TF and IDF entirely, e.g. just set it to a constant such as 1 or if you think thats too risky, consider an alternative ranking scheme that doesn't use the IDF at all such as the example in https://issues.apache.org/jira/browse/LUCENE-2864 For now, you have to implement SimilarityProvider in a java class (with something like a hashmap returning different similaritys for different fields), and set this up with the similarity hook in schema.xml, but there is an issue open to make this easier: https://issues.apache.org/jira/browse/SOLR-2338