Re: [R] Improve code efficient with do.call, rbind and split contruction

Bert Gunter Fri, 02 Sep 2016 13:50:03 -0700

Chuck:

I think this is quite clever. But note that the which() is
unnecessary: logical indicing suffices, e.g.


df[!duplicated(df[,c("f","g")],fromLast = TRUE),]

I thought that your approach would be faster because it moves
comparisons from the tapply() to C code. But I was wrong. e.g. for 1e6
rows:

> set.seed(1001)
> df <- data.frame(f =factor(sample(LETTERS[1:4],1e6,rep=TRUE)),
                   +                 g
=factor(sample(letters[1:6],1e6,rep=TRUE)),
                   +                 y = runif(1e6))

##using duplicated()
 > system.time(z <-df[!duplicated(df[,c("f","g")],fromLast = TRUE),])
user  system elapsed
0.175   0.008   0.183

## Using tapply()
 > system.time(
    + {ix <- seq_len(nrow(df));
    + z <- df[with(df,tapply(ix,list(f,g),function(x)x[length(x)])),]
    + })
user  system elapsed
0.025   0.003   0.028


This illustrates the faultiness of my "intuition."  A guess would be
that the subscripting to get the factor combinations and
duplicated.data.frame method takes the extra time.

Anyway...

Best,

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Sep 2, 2016 at 11:50 AM, Charles C. Berry <ccbe...@ucsd.edu> wrote:
> On Fri, 2 Sep 2016, Bert Gunter wrote:
> [snip]
>>
>>
>> The "trick" is to use tapply() to select the necessary row indices of
>> your data frame and forget about all the do.call and rbind stuff. e.g.
>>
>
> I agree the way to go is "select the necessary row indices" but I get there
> a different way. See below.
>
>>> set.seed(1001)
>>> df <- data.frame(f =factor(sample(LETTERS[1:4],100,rep=TRUE)),
>>
>> +                  g <- factor(sample(letters[1:6],100,rep=TRUE)),
>> +                  y = runif(100))
>>>
>>>
>>> ix <- seq_len(nrow(df))
>>>
>>> ix <- with(df,tapply(ix,list(f,g),function(x)x[length(x)]))
>>> ix
>>
>>   a  b   c  d  e  f
>> A 94 69 100 59 80 87
>> B 89 57  65 90 75 88
>> C 85 92  86 95 97 62
>> D 47 73  72 74 99 96
>
>
>
>   jx <- which( !duplicated( df[,c("f","g")], fromLast=TRUE ))
>
>   xtabs(jx~f+g,df[jx,]) ## Show equivalence to Bert's `ix'
>
>    g
> f     a   b   c   d   e   f
>   A  94  69 100  59  80  87
>   B  89  57  65  90  75  88
>   C  85  92  86  95  97  62
>   D  47  73  72  74  99  96
>
>
> Chuck
>
>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improve code efficient with do.call, rbind and split contruction

Reply via email to