Hey Jeff, I have a few ideas. Each has some different requirements, and to help you choose, I bench marked them.
###START### ##Basic data > test <- data.frame(totret=rnorm(10^7), id=rep(1:10^4, each=10^3), > time=rep(c(1, rep(0, 999)), 10^4)) ##Option 1: probably the most general, but also the slowest by far. ##The idea is it does the calculation for each stock/ID, and then concatenates [c()] an NA in front. > system.time(test[,"dailyreturns"] <- unlist(by(test[,"totret"], test[,"id"], > function(x) {c(NA, x[-1]/x[-length(x)])})), gcFirst=TRUE) user system elapsed 49.11 0.42 49.86 ##Option 2: Assumes that you have the same number of measurements for each stock/ID so you can just assign an NA every nth row. ##This is fairly fast > system.time(test[-1,"dailyreturns"] <- > test[-1,"totret"]/test[-nrow(test),"totret"], gcFirst=TRUE) user system elapsed 1.11 0.21 1.31 > system.time(test[seq(1, 10^7, by=10^3),"dailyreturns"] <- NA, gcFirst=TRUE) user system elapsed 0.39 0.04 0.42 ##Option 3: Assumes that you have some variable (time in my little test data) that somehow indicates when each stock/ID has its first measurement. In the example, the first measurement gets a 1 and subsequent ones a 0. So we just assign NA in 'dailyreturns' everytime the other "time" column has a 1. Again, a big assumption, but fairly quick. > system.time(test[-1,"dailyreturns"] <- > test[-1,"totret"]/test[-nrow(test),"totret"], gcFirst=TRUE) user system elapsed 1.06 0.17 1.25 > system.time(test[which(test[,"time"]==1),"dailyreturns"] <- NA, gcFirst=TRUE) user system elapsed 0.46 0.09 0.55 ###END### I really feel like there should be a faster way that is also more general, but it is late and I am not coming up with any better ideas at the moment. Perhaps somehow finding the first instance of a stock/ID? Anyway, this was simulated on 10 million rows, so maybe by() works plenty fast for you. Josh On Thu, Jun 3, 2010 at 10:20 PM, Jeff08 <jefferyd...@gmail.com> wrote: > > Hey Josh, > > Thanks for the quick response! > > I guess I have to switch from the Java mindset to the matrix/vector mindset > of R. > > Your code worked very well, but I just have one problem: > > Essentially I have a time series of stock A, followed by a time series of > stock B, etc. > So there are break points in the data (the points where it switches stocks > have incorrect returns, and should be NA at t=0 for each stock) > > Is there an easy way to account for this in R? What I was thinking of is if > there is a way to make a filter rule. Such as if the ID of the row matches > Stock A, then perform this. > >>>"Hello Jeff, > > Try this: > > test <- data.frame(totret=rnorm(10^7)) #create some sample data > test[-1,"dailyreturn"] <- test[-1,"totret"]/test[-nrow(test),"totret"] > > The general idea is to take the column "totret" excluding the first 1, > dividided by "totret" exluding the last row. This gives in effect t+1 > (since t is now shorter)/t > > I assigned the result to a new column "dailyreturn". For 10^7 rows, > it tooks 1.92 seconds on my system." > -- > View this message in context: > http://r.789695.n4.nabble.com/R-Newbie-please-help-tp2242633p2242703.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Senior in Psychology University of California, Riverside http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.