See ?duplicated On Tue, Jul 13, 2010 at 7:42 PM, david hilton shanabrook < davidshanabr...@me.com> wrote:
> I wrote something to check for duplicate rows in a data frame, but it is > too inefficient. Is there a way to do this without the nested loops? > > This code correctly indicates rows 1-7, 1-8, 2-9 and 7-8 are duplicates. > > > m <- matrix(c(1,1,1,1,1, 2,2,2,2,2, 6,6,6,6,6, 3,3,3,3,3, 4,4,4,4,4, > 5,5,5,5,5, 1,1,1,1,1, 1,1,1,1,1, 2,2,2,2,2, 7,7,7,7,7), ncol=5, byrow=TRUE) > > df <- data.frame(m) > > df > X1 X2 X3 X4 X5 > 1 1 1 1 1 1 > 2 2 2 2 2 2 > 3 6 6 6 6 6 > 4 3 3 3 3 3 > 5 4 4 4 4 4 > 6 5 5 5 5 5 > 7 1 1 1 1 1 > 8 1 1 1 1 1 > 9 2 2 2 2 2 > 10 7 7 7 7 7 > > > > compareTwoRows <- function(row1, row2){ > + numCol <- 5 > + logicalRow <- row1==row2 > + duplicate <- sum(logicalRow)==numCol > + return(as.numeric(duplicate))} > > > > same <- matrix(0, byrow=TRUE, ncol=10,nrow=10) > > > > for (j in 1:9) > + for (k in (j+1):10) > + same[j,k] <- compareTwoRows(df[j,],df[k,]) > > > > same > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > [1,] 0 0 0 0 0 0 1 1 0 0 > [2,] 0 0 0 0 0 0 0 0 1 0 > [3,] 0 0 0 0 0 0 0 0 0 0 > [4,] 0 0 0 0 0 0 0 0 0 0 > [5,] 0 0 0 0 0 0 0 0 0 0 > [6,] 0 0 0 0 0 0 0 0 0 0 > [7,] 0 0 0 0 0 0 0 1 0 0 > [8,] 0 0 0 0 0 0 0 0 0 0 > [9,] 0 0 0 0 0 0 0 0 0 0 > [10,] 0 0 0 0 0 0 0 0 0 0 > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.