On Apr 12, 2011, at 10:33 , Joris Meys wrote:

> Thanks for the explanation, I wasn't fully aware of which optimization
> I was using. I reckon your solution is more R-sound, so no reason to
> keep with my bizarre workaround. It would be nice though if gl() got
> optimized. Thank you for the example too, I'm learning every day.
> 

I have now committed a version of the below to r-devel. (A couple of demons 
turned out to be lurking in the details, so not exactly the same code.)

-pd

> Cheers
> Joris
> 
> On Tue, Apr 12, 2011 at 8:51 AM, peter dalgaard <pda...@gmail.com> wrote:
>> 
>> On Apr 11, 2011, at 23:53 , Joris Meys wrote:
>> 
>>> Based on a discussion on SO I ran some tests and found that converting
>>> to a factor is best done early in the process. Hence, I propose to
>>> rewrite the gl() function as :
>>> 
>>> gl2 <- function(n, k, length = n * k, labels = 1:n, ordered = FALSE){
>>>  rep(
>>>      rep(
>>>        factor(1:n,levels=1:n,labels=labels, ordered=ordered),rep.int(k,n)
>>>      ),length.out=length
>>>  )
>>> }
>>> 
>> 
>> That's bizarre! You are relying on an optimization in rep.factor whereby it 
>> replicates the internal codes and exploits that the result has the same 
>> structure as the input. I.e., it just tacks on class and levels attributes 
>> rather than call match() as factor() does internally.
>> 
>> However, you can do the same thing straight away:
>> 
>>> gl2
>> function (n, k, length = n * k, labels = 1:n, ordered = FALSE)
>> {
>>   y <- rep(rep.int(1:n, rep.int(k, n)), length.out = length)
>>   structure(y, levels=as.character(labels), 
>> class=c(if(ordered)"ordered","factor"))
>> }
>> 
>> I get this to be a bit faster than your version, although with a smaller 
>> speedup factor, which probably just indicates that match() is faster on this 
>> machine.
>> 
>>> Some test results  :
>>> 
>>>> system.time(X1 <- gl(5,1e7))
>>>   user  system elapsed
>>>  29.21    0.30   29.58
>>> 
>>>> system.time(X2 <- gl2(5,1e7))
>>>   user  system elapsed
>>>   1.87    0.45    2.37
>>> 
>>>> all.equal(X1,X2)
>>> [1] TRUE
>>> 
>>>> system.time(X1 <- gl(5,100,1e7))
>>>   user  system elapsed
>>>   5.98    0.05    6.05
>>> 
>>>> system.time(X2 <- gl2(5,100,1e7))
>>>   user  system elapsed
>>>   0.21    0.03    0.25
>>> 
>>>> all.equal(X1,X2)
>>> [1] TRUE
>>> 
>>>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5]))
>>>   user  system elapsed
>>>   5.88    0.02    5.98
>>> 
>>>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5]))
>>>   user  system elapsed
>>>   0.20    0.05    0.25
>>> 
>>>> all.equal(X1,X2)
>>> [1] TRUE
>>> 
>>>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5],ordered=T))
>>>   user  system elapsed
>>>   5.82    0.03    5.89
>>> 
>>>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5],ordered=T))
>>>   user  system elapsed
>>>   0.22    0.04    0.25
>>> 
>>>> all.equal(X1,X2)
>>> [1] TRUE
>>> 
>>> reference to SO :
>>> http://stackoverflow.com/questions/5627264/how-can-i-efficiently-construct-a-very-long-factor-with-few-levels
>>> 
>>> --
>>> Joris Meys
>>> Statistical consultant
>>> 
>>> Ghent University
>>> Faculty of Bioscience Engineering
>>> Department of Applied mathematics, biometrics and process control
>>> 
>>> tel : +32 9 264 59 87
>>> joris.m...@ugent.be
>>> -------------------------------
>>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>> 
>>> ______________________________________________
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> --
>> Peter Dalgaard
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Email: pd....@cbs.dk  Priv: pda...@gmail.com
>> 
>> 
> 
> 
> 
> -- 
> Joris Meys
> Statistical consultant
> 
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
> 
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to