That works fine if you have a query that matches things with a wide range of 
popularities. But that is the easy case.

What about the query “twilight”, which matches all the Twilight movies, all of 
which are popular (millions of views). Or “Lord of the Rings” which only 
matches movies with hundreds of views? People really will notice when the 1978 
animated version shows up before the Peter Jackson films.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Mar 18, 2016, at 8:18 AM, <jimi.hulleg...@svensktnaringsliv.se> 
> <jimi.hulleg...@svensktnaringsliv.se> wrote:
> 
> On Friday, March 18, 2016 3:53 PM, wun...@wunderwood.org wrote:
>> 
>> Popularity has a very wide range. Try my example, scale 1 million and 100 
>> into the same 1.0-0.0 range. Even with log popularity.
> 
> Well, in our case, we don't really care do differentiate between documents 
> with low popularity. And if we know roughly what the popularity distribution 
> is it is not hard to normalize it to a value between 0.0 and 1.0. The most 
> simple approach is to simply focus on the maximum value, and mapping that 
> value to 1.0, so basically the normalization function is: 
> normalizedValue=value/maxValue. But knowing the mean and median, or other 
> statistical information, one could of course use a more advanced calculation.
> 
> In essence, if one can answer the question "How popular is this 
> document/movie/item?", using "extremely popular", "very popular", "quite 
> popular", "average", "not very popular" and "very unpopular" (ie popularity 
> normalized down to 6 possible values), it should not be that hard to 
> normalize the popularity to a value between 0.0 and 1.0.
> 
> /Jimi

Reply via email to