What I am doing is trying to determine where the dates are not sequential (difference is not one day). Everytime that this occurs, the expression 'diff(.days) != 1' is TRUE and this is where a new sequence starts. 'diff' will return a vector one shorter than its input; I am assuming that the first date starts a sequence, so that is why the TRUE is the initial entry. Using 'cumsum' will generate a vector that has the same values for dates that are consecutive. By using table, you can determine what the maximum number of consecutive days are.
HTH On Thu, Oct 1, 2009 at 2:57 PM, gd047 <gd...@mineknowledge.com> wrote: > > Congratulations! > > Could you explain to me the reason you add an initial "TRUE" value in the > cumulatice sum? > > > > jholtman wrote: >> >> Will this work: >> >>> x <- read.table(textConnection(" day user_id >> + 2008/11/01 2001 >> + 2008/11/01 2002 >> + 2008/11/01 2003 >> + 2008/11/01 2004 >> + 2008/11/01 2005 >> + 2008/11/02 2001 >> + 2008/11/02 2005 >> + 2008/11/03 2001 >> + 2008/11/03 2003 >> + 2008/11/03 2004 >> + 2008/11/03 2005 >> + 2008/11/04 2001 >> + 2008/11/04 2003 >> + 2008/11/04 2004 >> + 2008/11/04 2005"), header=TRUE) >>> closeAllConnections() >>> # convert to Date >>> x$day <- as.Date(x$day, format="%Y/%m/%d") >>> # split by user and then look for contiguous days >>> contig <- sapply(split(x$day, x$user_id), function(.days){ >> + .diff <- cumsum(c(TRUE, diff(.days) != 1)) >> + max(table(.diff)) >> + }) >>> contig >> 2001 2002 2003 2004 2005 >> 4 1 2 2 4 >>> >>> >> >> >> On Thu, Oct 1, 2009 at 11:29 AM, gd047 <gd...@mineknowledge.com> wrote: >>> >>> ...if that is possible >>> >>> My task is to find the longest streak of continuous days a user >>> participated >>> in a game. >>> >>> Instead of writing an sql function, I chose to use the R's rle function, >>> to >>> get the longest streaks and then update my db table with the results. >>> >>> The (attached) dataframe is something like this: >>> >>> day user_id >>> 2008/11/01 2001 >>> 2008/11/01 2002 >>> 2008/11/01 2003 >>> 2008/11/01 2004 >>> 2008/11/01 2005 >>> 2008/11/02 2001 >>> 2008/11/02 2005 >>> 2008/11/03 2001 >>> 2008/11/03 2003 >>> 2008/11/03 2004 >>> 2008/11/03 2005 >>> 2008/11/04 2001 >>> 2008/11/04 2003 >>> 2008/11/04 2004 >>> 2008/11/04 2005 >>> >>> >>> >>> --- R code follows >>> ------------------------------------------------------ >>> >>> >>> # turn it to a contingency table >>> my_table <- table(user_id, day) >>> >>> # get the streaks >>> rle_table <- apply(my_table,1,rle) >>> >>> # verify the longest streak of "1"s for user 2001 >>> # as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values, >>> max)["1"]) >>> >>> # loop to get the results >>> # initiate results matrix >>> res<-matrix(nrow=dim(my_table)[1], ncol=2) >>> >>> for (i in 1:dim(my_table)[1]) { >>> string <- paste("as.vector(tapply(rle_table$'", rownames(my_table)[i], >>> "'$lengths, rle_table$'", rownames(my_table)[i], "'$values, max)['1'])", >>> sep="") >>> res[i,]<-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string))) >>> } >>> >>> >>> ---------------------------------------------------- >>> --- end of R code >>> >>> Unfortunately this for loop takes too long and I' wondering if there is a >>> way to produce the res matrix using a function from the "apply" family. >>> >>> Thank you in advance >>> -- >>> View this message in context: >>> http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25696937.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: > http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25704683.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.