Re: [Rd] proposal for adapting code of function gl()

Joris Meys Tue, 12 Apr 2011 01:34:27 -0700

Thanks for the explanation, I wasn't fully aware of which optimization
I was using. I reckon your solution is more R-sound, so no reason to
keep with my bizarre workaround. It would be nice though if gl() got
optimized. Thank you for the example too, I'm learning every day.


Cheers
Joris

On Tue, Apr 12, 2011 at 8:51 AM, peter dalgaard <[email protected]> wrote:
>
> On Apr 11, 2011, at 23:53 , Joris Meys wrote:
>
>> Based on a discussion on SO I ran some tests and found that converting
>> to a factor is best done early in the process. Hence, I propose to
>> rewrite the gl() function as :
>>
>> gl2 <- function(n, k, length = n * k, labels = 1:n, ordered = FALSE){
>>  rep(
>>      rep(
>>        factor(1:n,levels=1:n,labels=labels, ordered=ordered),rep.int(k,n)
>>      ),length.out=length
>>  )
>> }
>>
>
> That's bizarre! You are relying on an optimization in rep.factor whereby it 
> replicates the internal codes and exploits that the result has the same 
> structure as the input. I.e., it just tacks on class and levels attributes 
> rather than call match() as factor() does internally.
>
> However, you can do the same thing straight away:
>
>> gl2
> function (n, k, length = n * k, labels = 1:n, ordered = FALSE)
> {
>   y <- rep(rep.int(1:n, rep.int(k, n)), length.out = length)
>   structure(y, levels=as.character(labels), 
> class=c(if(ordered)"ordered","factor"))
> }
>
> I get this to be a bit faster than your version, although with a smaller 
> speedup factor, which probably just indicates that match() is faster on this 
> machine.
>
>> Some test results  :
>>
>>> system.time(X1 <- gl(5,1e7))
>>   user  system elapsed
>>  29.21    0.30   29.58
>>
>>> system.time(X2 <- gl2(5,1e7))
>>   user  system elapsed
>>   1.87    0.45    2.37
>>
>>> all.equal(X1,X2)
>> [1] TRUE
>>
>>> system.time(X1 <- gl(5,100,1e7))
>>   user  system elapsed
>>   5.98    0.05    6.05
>>
>>> system.time(X2 <- gl2(5,100,1e7))
>>   user  system elapsed
>>   0.21    0.03    0.25
>>
>>> all.equal(X1,X2)
>> [1] TRUE
>>
>>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5]))
>>   user  system elapsed
>>   5.88    0.02    5.98
>>
>>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5]))
>>   user  system elapsed
>>   0.20    0.05    0.25
>>
>>> all.equal(X1,X2)
>> [1] TRUE
>>
>>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5],ordered=T))
>>   user  system elapsed
>>   5.82    0.03    5.89
>>
>>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5],ordered=T))
>>   user  system elapsed
>>   0.22    0.04    0.25
>>
>>> all.equal(X1,X2)
>> [1] TRUE
>>
>> reference to SO :
>> http://stackoverflow.com/questions/5627264/how-can-i-efficiently-construct-a-very-long-factor-with-few-levels
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> [email protected]
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: [email protected]  Priv: [email protected]
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
[email protected]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] proposal for adapting code of function gl()

Reply via email to