Chuck: I think this is quite clever. But note that the which() is unnecessary: logical indicing suffices, e.g.
df[!duplicated(df[,c("f","g")],fromLast = TRUE),] I thought that your approach would be faster because it moves comparisons from the tapply() to C code. But I was wrong. e.g. for 1e6 rows: > set.seed(1001) > df <- data.frame(f =factor(sample(LETTERS[1:4],1e6,rep=TRUE)), + g =factor(sample(letters[1:6],1e6,rep=TRUE)), + y = runif(1e6)) ##using duplicated() > system.time(z <-df[!duplicated(df[,c("f","g")],fromLast = TRUE),]) user system elapsed 0.175 0.008 0.183 ## Using tapply() > system.time( + {ix <- seq_len(nrow(df)); + z <- df[with(df,tapply(ix,list(f,g),function(x)x[length(x)])),] + }) user system elapsed 0.025 0.003 0.028 This illustrates the faultiness of my "intuition." A guess would be that the subscripting to get the factor combinations and duplicated.data.frame method takes the extra time. Anyway... Best, Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Sep 2, 2016 at 11:50 AM, Charles C. Berry <ccbe...@ucsd.edu> wrote: > On Fri, 2 Sep 2016, Bert Gunter wrote: > [snip] >> >> >> The "trick" is to use tapply() to select the necessary row indices of >> your data frame and forget about all the do.call and rbind stuff. e.g. >> > > I agree the way to go is "select the necessary row indices" but I get there > a different way. See below. > >>> set.seed(1001) >>> df <- data.frame(f =factor(sample(LETTERS[1:4],100,rep=TRUE)), >> >> + g <- factor(sample(letters[1:6],100,rep=TRUE)), >> + y = runif(100)) >>> >>> >>> ix <- seq_len(nrow(df)) >>> >>> ix <- with(df,tapply(ix,list(f,g),function(x)x[length(x)])) >>> ix >> >> a b c d e f >> A 94 69 100 59 80 87 >> B 89 57 65 90 75 88 >> C 85 92 86 95 97 62 >> D 47 73 72 74 99 96 > > > > jx <- which( !duplicated( df[,c("f","g")], fromLast=TRUE )) > > xtabs(jx~f+g,df[jx,]) ## Show equivalence to Bert's `ix' > > g > f a b c d e f > A 94 69 100 59 80 87 > B 89 57 65 90 75 88 > C 85 92 86 95 97 62 > D 47 73 72 74 99 96 > > > Chuck > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.