Re: Why is multiplicative boost prefered over additive?

Walter Underwood Sat, 19 Mar 2016 09:50:44 -0700

Think about using popularity as a boost. If one movie has a million rentals and 
one has a hundred rentals, there is no additive formula that balances that with 
text relevance. Even with log(popularity), it doesn’t work.


With multiplicative boost, we only care about the difference between the one 
rented one million time and the one rented 800 thousand times (think about the 
Twilight movies at Netflix). But it also distinguishes between the one rented 
100 times and the one rented 80 times.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Mar 17, 2016, at 11:29 AM, jimi.hulleg...@svensktnaringsliv.se wrote:
> 
> Hi,
> 
> After reading a bit on various sites, and especially the blog post "Comparing 
> boost methods in Solr", it seems that the preferred boosting type is the 
> multiplicative one, over the additive one. But I can't really get my head 
> around *why* that is so, since in most boosting problems I can think of, it 
> seems that an additive boost would suit better.
> 
> For example, in our project we want to boost documents depending on various 
> factors, but in essence they can be summarized as:
> 
> - Regular edismax logic, like qf=title^2 mainText^1
> - Multiple custom document fields, with weights specified at query time
> 
> So, first of, the custom fields... It became obvious to me quite quickly that 
> multiplicative logic here would totally ruin the purpose of the weights, 
> since something like "(f1 *  w1) * (f2 * w2)" is the same as "(f1 *  w2) * 
> (f2 * w1)". So, I ended up using additive boost here.
> 
> Then we have the combination of the edismax boost, and my custom boost. As 
> far as I understand it, when using the boost field with edismax, this 
> combination is always performed using multiplicative logic. But the same 
> problem exists here as it did with my custom fields. Because if I boost the 
> aggregated result of the custom fields using some weight, it doesn't affect 
> the order of the documents because that weight influences the edismax boost 
> just as much. What I want is to have the weight only influence my custom 
> boost value, so that I can control how much (or little) the final score 
> should be effected by the custom boost.
> 
> So, in both cases I find myself wanting to use the additive boost. But surely 
> I must be missing something, right? Am I thinking backwards or something?
> 
> I don't use any out-of-the-box example indexes, so I can provide you with a 
> working URL that shows exactly what I am doing. But in essence my query looks 
> like this:
> 
> - q=test
> - defType=edismax
> - qf=title^2&qf=mainText1^1
> - 
> totalRanking=div(sum(product(random1,1),product(random2,1.5),product(random3,2),product(random4,2.5),product(random5,3)),5)
> - weightedTotalRanking=product($totalRanking,1.5)
> - bf=$weightedTotalRanking
> - fl=*,score,[explain style=text],$weightedTotalRanking
> 
> random1 to random5 are document fields of type double, with random values 
> between 0.0 and 1.0.
> 
> With this setup, I can change the overall importance of my custom boosting 
> using the factor in weightedTotalRanking (1.5 above). But that is only 
> because bf is additive. If I switch to the boost parameter, I can no longer 
> influence the order of the documents using this factor, no matter how high a 
> value I choose.
> 
> Am I looking at the this the wrong way? Is there a much better approach to 
> achieve what I want?
> 
> Regards
> /Jimi

Re: Why is multiplicative boost prefered over additive?

Reply via email to