Re: [R] help with mysql and R: partitioning by quintile

2011-05-15 Thread gj
Here's how I'm trying to solve the diversity problem inherent in the data (see below for a definition of the problem): if (interquintile ranges have >=4 ranges at the same freq) then (use rating=3) else (use rating as described in jim's code) i'll have a go and post an update. in the mean time, if

Re: [R] help with mysql and R: partitioning by quintile

2011-05-14 Thread gj
Jim's suggestion did the trick: tqm <- do.call(rbind, tq) + 0.001 head(x.new) userid freq track rating [1,] 11 1 1 [2,] 1 10 2 5 [3,] 11 3 1 [4,] 11 4 1 [5,] 1 15 5 5 [6,] 14 6 3 Dennis, w

Re: [R] help with mysql and R: partitioning by quintile

2011-05-14 Thread jim holtman
An easy way is to just offset the quantiles by a small increment so that boundary condition is less likely. If you change the line tqm <- do.call(rbind, tq) + 0.001 in my example, that should do the trick. On Sat, May 14, 2011 at 6:09 PM, gj wrote: > Hi, > I think I haven't been able to explai

Re: [R] help with mysql and R: partitioning by quintile

2011-05-14 Thread gj
Hi, I think I haven't been able to explain correctly what I want. Here another try: Given that I have the following input: userid,track,freq 1,1,1 1,2,10 1,3,1 1,4,1 1,5,15 1,6,4 1,7,16 1,8,6 1,9,1 1,10,1 1,11,2 1,12,2 1,13,1 1,14,6 1,15,7 1,16,13 1,17,3 1,18,2 1,19,5 1,20,2 1,21,2 1,22,6 1,23,4 1

Re: [R] help with mysql and R: partitioning by quintile

2011-05-14 Thread Dennis Murphy
Hi: Is this what you're after? tq <- with(ds, quantile(freq, seq(0.2, 1, by = 0.2))) ds$int <- with(ds, cut(freq, c(0, tq))) with(ds, table(int)) int (0,1] (1,2] (2,4] (4,7] (7,16] 10 6 7 6 6 HTH, Dennis On Sat, May 14, 2011 at 9:42 AM, gj wrote: > Hi Jim, > Thanks

Re: [R] help with mysql and R: partitioning by quintile

2011-05-14 Thread gj
Hi Jim, Thanks very much for the code. I modified it a bit because I needed to allocate the track ratings by userid (eg if user 1 plays track x once, he gets rating 1, user 1 plays track y 100 times, he gets a rating 5) and not by track (sorry if this wasn't clear in my original post). This is alm

Re: [R] help with mysql and R: partitioning by quintile

2011-05-08 Thread Phil Spector
One way to get the ratings would be to use the ave() function: rating = ave(x$freq,x$track, FUN=function(x)cut(x,quantile(x,(0:5)/5),include.lowest=TRUE)) - Phil Spector Statistical Computing Facility

Re: [R] help with mysql and R: partitioning by quintile

2011-05-08 Thread jim holtman
try this: > # create some data > x <- data.frame(userid = paste('u', rep(1:20, each = 20), sep = '') + , track = rep(1:20, 20) + , freq = floor(runif(400, 10, 200)) + , stringsAsFactors = FALSE + ) > # get the quantiles for each track > tq <-

[R] help with mysql and R: partitioning by quintile

2011-05-08 Thread gj
Hi, I have a mysql table with fields userid,track,frequency e.g u1,1,10 u1,2,100 u1,3,110 u1,4,200 u1,5,120 u1,6,130 . u2,1,23 . . where "frequency" is the number of times a music track is played by a "userid" I need to turn my 'frequency' table into a rating table (it's for a recommender system)