Re: [R] why is nrow() so slow?

ivo welch Tue, 15 Sep 2009 14:13:06 -0700

interestingly, in my case, the opposite seems to be the case.  data frames
seem faster than matrices when it comes to "by" computation (which is where
most of my calculations are in):


### here is my data frame and some information about it
> dim(rets.subset)
[1] 132508      3
> names(rets.subset)
[1] "PERMNO" "RET"    "mdate"
> length(unique(as.factor(rets.subset$PERMNO)))
[1] 6832
> length((as.factor(rets.subset$PERMNO)))
[1] 132508

### calculation using data frame
> system.time( { by( rets.subset, as.factor(rets.subset$PERMNO), mean) } )
   user  system elapsed
  3.295   2.798   6.095

### same as matrix
> m=as.matrix(rets.subset)
> system.time( { a=by( m, as.factor(m[,1]), mean) } )
   user  system elapsed
  5.371   5.557  10.928

PS: Any speed suggestions are appreciated.  This is "experimenting time" for
me.


> One note:  if you're worried about speed, it almost always makes sense to
use matrices rather than dataframes.  If you've got mixed types this is
tedious and error-prone (each type needs to be in a separate matrix), but if
your data is all numeric, it's very simple, and will make things a lot
faster.




>
> Duncan Murdoch
>



-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
CV Starr Professor of Economics (Finance), Brown University
http://welch.econ.brown.edu/

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] why is nrow() so slow?

Reply via email to