interestingly, in my case, the opposite seems to be the case. data frames seem faster than matrices when it comes to "by" computation (which is where most of my calculations are in):
### here is my data frame and some information about it > dim(rets.subset) [1] 132508 3 > names(rets.subset) [1] "PERMNO" "RET" "mdate" > length(unique(as.factor(rets.subset$PERMNO))) [1] 6832 > length((as.factor(rets.subset$PERMNO))) [1] 132508 ### calculation using data frame > system.time( { by( rets.subset, as.factor(rets.subset$PERMNO), mean) } ) user system elapsed 3.295 2.798 6.095 ### same as matrix > m=as.matrix(rets.subset) > system.time( { a=by( m, as.factor(m[,1]), mean) } ) user system elapsed 5.371 5.557 10.928 PS: Any speed suggestions are appreciated. This is "experimenting time" for me. > One note: if you're worried about speed, it almost always makes sense to use matrices rather than dataframes. If you've got mixed types this is tedious and error-prone (each type needs to be in a separate matrix), but if your data is all numeric, it's very simple, and will make things a lot faster. > > Duncan Murdoch > -- Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com) CV Starr Professor of Economics (Finance), Brown University http://welch.econ.brown.edu/ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.