On Mon, Apr 11, 2011 at 02:05:11PM -0400, Duncan Murdoch wrote: > On 08/04/2011 11:39 AM, Joshua Ulrich wrote: > >On Fri, Apr 8, 2011 at 10:15 AM, Duncan Murdoch > ><murdoch.dun...@gmail.com> wrote: > >> On 08/04/2011 11:08 AM, Joshua Ulrich wrote: > >>> > >>> How about: > >>> > >>> y<- rep(NA,length(x)) > >>> y[duplicated(x)]<- match(x[duplicated(x)] ,x) > >> > >> That's a nice solution for vectors. Unfortunately for me, I have a > >matrix > >> (which duplicated() handles by checking whole rows). So a better > >example > >> that I should have posted would be > >> > >> x<- cbind(1, c(9,7,9,3,7) ) > >> > >> and I'd still like the same output > >> > >For a matrix, could you apply the same strategy used in duplicated()? > > > >y<- rep(NA,NROW(x)) > >temp<- apply(x, 1, function(x) paste(x, collapse="\r")) > >y[duplicated(temp)]<- match(temp[duplicated(temp)], temp) > > Since this thread hasn't ended, I will say that I think this solution is > the best I've seen for my specific problem. I was actually surprised > that duplicated() did the string concatenation trick, but since it does, > it makes a lot of sense to do the same in duplicates().
Consistency with duplicated() is a good argument. Let me point out, although it goes beyond the original question, that sorting may be used to compute duplicated() in a way, which is more efficient than the paste() approach according to the test below. duplicatedSort <- function(df) { n <- nrow(df) if (n == 1) { return(FALSE) } else { s <- do.call(order, as.data.frame(df)) equal <- df[s[2:n], , drop=FALSE] == df[s[1:(n-1)], , drop=FALSE] dup <- c(FALSE, rowSums(equal) == ncol(df)) return(dup[order(s)]) } } The following tests efficiency for a character matrix. m <- 1000 n <- 4 a <- matrix(as.character(sample(10, m*n, replace=TRUE)), nrow=m, ncol=n) system.time(out1 <- duplicatedSort(a)) system.time(out2 <- duplicated(a)) identical(out1, out2) table(out1) I obtained, for example, user system elapsed 0.003 0.000 0.003 user system elapsed 0.012 0.000 0.011 [1] TRUE out1 FALSE TRUE 942 58 For a numeric matrix, the ratio of the running times is larger in the same direction. Petr Savicky. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel