On Fri, Feb 25, 2011 at 1:57 PM, Jan Høydahl wrote:
> I also have a case (yellow-page) where IDF comes in and destroys the rank.
> A company listing with a word which occurs in few other listings is not
> necessarily better than others just because of that. When it gets to the
> extreme value of
Jan,
You are correct, you'll need your own Similarity class.
Have a look at SweetSpotSimilarity
(http://lucene.apache.org/java/3_0_3/api/contrib-misc/org/apache/lucene/misc/SweetSpotSimilarity.html)
On 2/25/11 10:57 AM, Jan Høydahl wrote:
I also have a case (yellow-page) where IDF comes in a
I also have a case (yellow-page) where IDF comes in and destroys the rank.
A company listing with a word which occurs in few other listings is not
necessarily better than others just because of that. When it gets to the
extreme value of IDF=1, we get an artificially high IDF boost.
It is not kil
On Wed, Dec 15, 2010 at 3:09 AM, Jan Høydahl / Cominvent
wrote:
> Any way to disable TF/IDF normalization without also disabling positions?
>
see Similarity.tf(float) and Similarity.tf(int)
if you want to change this for both terms and phrases just override
Similarity.tf(float), since by default