Re: [Rd] proposal for adapting code of function gl()

peter dalgaard Mon, 11 Apr 2011 23:52:08 -0700

On Apr 11, 2011, at 23:53 , Joris Meys wrote:

> Based on a discussion on SO I ran some tests and found that converting
> to a factor is best done early in the process. Hence, I propose to
> rewrite the gl() function as :
> 
> gl2 <- function(n, k, length = n * k, labels = 1:n, ordered = FALSE){
>  rep(
>      rep(
>        factor(1:n,levels=1:n,labels=labels, ordered=ordered),rep.int(k,n)
>      ),length.out=length
>  )
> }
>


That's bizarre! You are relying on an optimization in rep.factor whereby it 
replicates the internal codes and exploits that the result has the same 
structure as the input. I.e., it just tacks on class and levels attributes 
rather than call match() as factor() does internally. 

However, you can do the same thing straight away: 

> gl2
function (n, k, length = n * k, labels = 1:n, ordered = FALSE) 
{
   y <- rep(rep.int(1:n, rep.int(k, n)), length.out = length) 
   structure(y, levels=as.character(labels), 
class=c(if(ordered)"ordered","factor"))
}

I get this to be a bit faster than your version, although with a smaller 
speedup factor, which probably just indicates that match() is faster on this 
machine.

> Some test results  :
> 
>> system.time(X1 <- gl(5,1e7))
>   user  system elapsed
>  29.21    0.30   29.58
> 
>> system.time(X2 <- gl2(5,1e7))
>   user  system elapsed
>   1.87    0.45    2.37
> 
>> all.equal(X1,X2)
> [1] TRUE
> 
>> system.time(X1 <- gl(5,100,1e7))
>   user  system elapsed
>   5.98    0.05    6.05
> 
>> system.time(X2 <- gl2(5,100,1e7))
>   user  system elapsed
>   0.21    0.03    0.25
> 
>> all.equal(X1,X2)
> [1] TRUE
> 
>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5]))
>   user  system elapsed
>   5.88    0.02    5.98
> 
>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5]))
>   user  system elapsed
>   0.20    0.05    0.25
> 
>> all.equal(X1,X2)
> [1] TRUE
> 
>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5],ordered=T))
>   user  system elapsed
>   5.82    0.03    5.89
> 
>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5],ordered=T))
>   user  system elapsed
>   0.22    0.04    0.25
> 
>> all.equal(X1,X2)
> [1] TRUE
> 
> reference to SO :
> http://stackoverflow.com/questions/5627264/how-can-i-efficiently-construct-a-very-long-factor-with-few-levels
> 
> -- 
> Joris Meys
> Statistical consultant
> 
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
> 
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] proposal for adapting code of function gl()

Reply via email to