On Tue, Dec 8, 2009 at 11:06 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Dec 9, 2009, at 12:00 AM, Peng Yu wrote: > >> On Tue, Dec 8, 2009 at 10:37 PM, David Winsemius <dwinsem...@comcast.net> >> wrote: >>> >>> On Dec 8, 2009, at 11:28 PM, Peng Yu wrote: >>> >>>> I have the following code, which tests the split on a data.frame and >>>> the split on each column (as vector) separately. The runtimes are of >>>> 10 time difference. When m and k increase, the difference become even >>>> bigger. >>>> >>>> I'm wondering why the performance on data.frame is so bad. Is it a bug >>>> in R? Can it be improved? >>> >>> You might want to look at the data.table package. The author calinms >>> significant speed improvements over dta.frames >> >> This bug has been found long time back and a package has been >> developed for it. Should the fix be integrated in data.frame rather >> than be implemented in an additional package? > > What bug?
Is the slow speed in splitting a data.frame a performance bug? >> >>> David. >>>> >>>>> system.time(split(as.data.frame(x),f)) >>>> >>>> user system elapsed >>>> 1.700 0.010 1.786 >>>>> >>>>> system.time(lapply( >>>> >>>> + 1:dim(x)[[2]] >>>> + , function(i) { >>>> + split(x[,i],f) >>>> + } >>>> + ) >>>> + ) >>>> user system elapsed >>>> 0.170 0.000 0.167 >>>> >>>> ########### >>>> m=30000 >>>> n=6 >>>> k=3000 >>>> >>>> set.seed(0) >>>> x=replicate(n,rnorm(m)) >>>> f=sample(1:k, size=m, replace=T) >>>> >>>> system.time(split(as.data.frame(x),f)) >>>> >>>> system.time(lapply( >>>> 1:dim(x)[[2]] >>>> , function(i) { >>>> split(x[,i],f) >>>> } >>>> ) >>>> ) >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius, MD >>> Heritage Laboratories >>> West Hartford, CT >>> >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.