interestingly, in my case, the opposite seems to be the case.  data frames
seem faster than matrices when it comes to "by" computation (which is where
most of my calculations are in):

### here is my data frame and some information about it
> dim(rets.subset)
[1] 132508      3
> names(rets.subset)
[1] "PERMNO" "RET"    "mdate"
> length(unique(as.factor(rets.subset$PERMNO)))
[1] 6832
> length((as.factor(rets.subset$PERMNO)))
[1] 132508

### calculation using data frame
> system.time( { by( rets.subset, as.factor(rets.subset$PERMNO), mean) } )
   user  system elapsed
  3.295   2.798   6.095

### same as matrix
> m=as.matrix(rets.subset)
> system.time( { a=by( m, as.factor(m[,1]), mean) } )
   user  system elapsed
  5.371   5.557  10.928

PS: Any speed suggestions are appreciated.  This is "experimenting time" for
me.


> One note:  if you're worried about speed, it almost always makes sense to
use matrices rather than dataframes.  If you've got mixed types this is
tedious and error-prone (each type needs to be in a separate matrix), but if
your data is all numeric, it's very simple, and will make things a lot
faster.




>
> Duncan Murdoch
>



-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
CV Starr Professor of Economics (Finance), Brown University
http://welch.econ.brown.edu/

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to