On Friday, March 18, 2016 3:53 PM, wun...@wunderwood.org wrote: > > Popularity has a very wide range. Try my example, scale 1 million and 100 > into the same 1.0-0.0 range. Even with log popularity.
Well, in our case, we don't really care do differentiate between documents with low popularity. And if we know roughly what the popularity distribution is it is not hard to normalize it to a value between 0.0 and 1.0. The most simple approach is to simply focus on the maximum value, and mapping that value to 1.0, so basically the normalization function is: normalizedValue=value/maxValue. But knowing the mean and median, or other statistical information, one could of course use a more advanced calculation. In essence, if one can answer the question "How popular is this document/movie/item?", using "extremely popular", "very popular", "quite popular", "average", "not very popular" and "very unpopular" (ie popularity normalized down to 6 possible values), it should not be that hard to normalize the popularity to a value between 0.0 and 1.0. /Jimi