Here are some differences between the current and proposed split.data.frame.
> d<-data.frame(Matrix=I(matrix(1:10, ncol=2)), Named=c(one=1,two=2,three=3,four=4,five=5), row.names=as.character(1001:1005)) > group<-c("A","B","A","A","B") > split.data.frame(d,group) $A Matrix.1 Matrix.2 Named 1001 1 6 1 1003 3 8 3 1004 4 9 4 $B Matrix.1 Matrix.2 Named 1002 2 7 2 1005 5 10 5 > mysplit.data.frame(d,group) # lost row.names and 2nd column of Matrix [1] "processing data.frame" $A Matrix Named [1,] 1 1 [2,] 3 3 [3,] 4 4 $B Matrix Named [1,] 2 2 [2,] 5 5 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of > pengyu...@gmail.com > Sent: Wednesday, December 09, 2009 2:10 PM > To: r-de...@stat.math.ethz.ch > Cc: r-b...@r-project.org > Subject: [Rd] split() is slow on data.frame (PR#14123) > > Please see the following code for the runtime comparison between > split() and mysplit.data.frame() (they do the same thing > semantically). mysplit.data.frame() is a fix of split() in term of > performance. Could somebody include this fix (with possible checking > for corner cases) in future version of R and let me know the inclusion > of the fix? > > m=300000 > n=6 > k=30000 > > set.seed(0) > x=replicate(n,rnorm(m)) > f=sample(1:k, size=m, replace=T) > > mysplit.data.frame<-function(x,f) { > print('processing data.frame') > v=lapply( > 1:dim(x)[[2]] > , function(i) { > split(x[,i],f) > } > ) > > w=lapply( > seq(along=v[[1]]) > , function(i) { > result=do.call( > cbind > , lapply(v, > function(vj) { > vj[[i]] > } > ) > ) > colnames(result)=colnames(x) > return(result) > } > ) > names(w)=names(v[[1]]) > return(w) > } > > system.time(split(as.data.frame(x),f)) > system.time(mysplit.data.frame(as.data.frame(x),f)) > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel