On Friday, March 18, 2016 3:53 PM, wun...@wunderwood.org wrote:
> 
> Popularity has a very wide range. Try my example, scale 1 million and 100 
> into the same 1.0-0.0 range. Even with log popularity.

Well, in our case, we don't really care do differentiate between documents with 
low popularity. And if we know roughly what the popularity distribution is it 
is not hard to normalize it to a value between 0.0 and 1.0. The most simple 
approach is to simply focus on the maximum value, and mapping that value to 
1.0, so basically the normalization function is: 
normalizedValue=value/maxValue. But knowing the mean and median, or other 
statistical information, one could of course use a more advanced calculation.

In essence, if one can answer the question "How popular is this 
document/movie/item?", using "extremely popular", "very popular", "quite 
popular", "average", "not very popular" and "very unpopular" (ie popularity 
normalized down to 6 possible values), it should not be that hard to normalize 
the popularity to a value between 0.0 and 1.0.

/Jimi

Reply via email to