On Apr 12, 2011, at 10:33 , Joris Meys wrote: > Thanks for the explanation, I wasn't fully aware of which optimization > I was using. I reckon your solution is more R-sound, so no reason to > keep with my bizarre workaround. It would be nice though if gl() got > optimized. Thank you for the example too, I'm learning every day. >
I have now committed a version of the below to r-devel. (A couple of demons turned out to be lurking in the details, so not exactly the same code.) -pd > Cheers > Joris > > On Tue, Apr 12, 2011 at 8:51 AM, peter dalgaard <pda...@gmail.com> wrote: >> >> On Apr 11, 2011, at 23:53 , Joris Meys wrote: >> >>> Based on a discussion on SO I ran some tests and found that converting >>> to a factor is best done early in the process. Hence, I propose to >>> rewrite the gl() function as : >>> >>> gl2 <- function(n, k, length = n * k, labels = 1:n, ordered = FALSE){ >>> rep( >>> rep( >>> factor(1:n,levels=1:n,labels=labels, ordered=ordered),rep.int(k,n) >>> ),length.out=length >>> ) >>> } >>> >> >> That's bizarre! You are relying on an optimization in rep.factor whereby it >> replicates the internal codes and exploits that the result has the same >> structure as the input. I.e., it just tacks on class and levels attributes >> rather than call match() as factor() does internally. >> >> However, you can do the same thing straight away: >> >>> gl2 >> function (n, k, length = n * k, labels = 1:n, ordered = FALSE) >> { >> y <- rep(rep.int(1:n, rep.int(k, n)), length.out = length) >> structure(y, levels=as.character(labels), >> class=c(if(ordered)"ordered","factor")) >> } >> >> I get this to be a bit faster than your version, although with a smaller >> speedup factor, which probably just indicates that match() is faster on this >> machine. >> >>> Some test results : >>> >>>> system.time(X1 <- gl(5,1e7)) >>> user system elapsed >>> 29.21 0.30 29.58 >>> >>>> system.time(X2 <- gl2(5,1e7)) >>> user system elapsed >>> 1.87 0.45 2.37 >>> >>>> all.equal(X1,X2) >>> [1] TRUE >>> >>>> system.time(X1 <- gl(5,100,1e7)) >>> user system elapsed >>> 5.98 0.05 6.05 >>> >>>> system.time(X2 <- gl2(5,100,1e7)) >>> user system elapsed >>> 0.21 0.03 0.25 >>> >>>> all.equal(X1,X2) >>> [1] TRUE >>> >>>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5])) >>> user system elapsed >>> 5.88 0.02 5.98 >>> >>>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5])) >>> user system elapsed >>> 0.20 0.05 0.25 >>> >>>> all.equal(X1,X2) >>> [1] TRUE >>> >>>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5],ordered=T)) >>> user system elapsed >>> 5.82 0.03 5.89 >>> >>>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5],ordered=T)) >>> user system elapsed >>> 0.22 0.04 0.25 >>> >>>> all.equal(X1,X2) >>> [1] TRUE >>> >>> reference to SO : >>> http://stackoverflow.com/questions/5627264/how-can-i-efficiently-construct-a-very-long-factor-with-few-levels >>> >>> -- >>> Joris Meys >>> Statistical consultant >>> >>> Ghent University >>> Faculty of Bioscience Engineering >>> Department of Applied mathematics, biometrics and process control >>> >>> tel : +32 9 264 59 87 >>> joris.m...@ugent.be >>> ------------------------------- >>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> -- >> Peter Dalgaard >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Email: pd....@cbs.dk Priv: pda...@gmail.com >> >> > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > joris.m...@ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel