See ?duplicated

On Tue, Jul 13, 2010 at 7:42 PM, david hilton shanabrook <
davidshanabr...@me.com> wrote:

> I wrote something to check for duplicate rows in a data frame, but it is
> too inefficient.  Is there a way to do this without the nested loops?
>
> This code correctly indicates rows 1-7, 1-8, 2-9 and 7-8 are duplicates.
>
> > m <- matrix(c(1,1,1,1,1, 2,2,2,2,2, 6,6,6,6,6, 3,3,3,3,3, 4,4,4,4,4,
> 5,5,5,5,5, 1,1,1,1,1, 1,1,1,1,1, 2,2,2,2,2, 7,7,7,7,7), ncol=5, byrow=TRUE)
> > df <- data.frame(m)
> > df
>   X1 X2 X3 X4 X5
> 1   1  1  1  1  1
> 2   2  2  2  2  2
> 3   6  6  6  6  6
> 4   3  3  3  3  3
> 5   4  4  4  4  4
> 6   5  5  5  5  5
> 7   1  1  1  1  1
> 8   1  1  1  1  1
> 9   2  2  2  2  2
> 10  7  7  7  7  7
> >
> > compareTwoRows <- function(row1, row2){
> +       numCol <- 5
> +       logicalRow <- row1==row2
> +       duplicate <- sum(logicalRow)==numCol
> +       return(as.numeric(duplicate))}
> >
> > same <- matrix(0, byrow=TRUE, ncol=10,nrow=10)
> >
> > for (j in 1:9)
> +       for (k in (j+1):10)
> +               same[j,k] <- compareTwoRows(df[j,],df[k,])
> >
> > same
>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>  [1,]    0    0    0    0    0    0    1    1    0     0
>  [2,]    0    0    0    0    0    0    0    0    1     0
>  [3,]    0    0    0    0    0    0    0    0    0     0
>  [4,]    0    0    0    0    0    0    0    0    0     0
>  [5,]    0    0    0    0    0    0    0    0    0     0
>  [6,]    0    0    0    0    0    0    0    0    0     0
>  [7,]    0    0    0    0    0    0    0    1    0     0
>  [8,]    0    0    0    0    0    0    0    0    0     0
>  [9,]    0    0    0    0    0    0    0    0    0     0
> [10,]    0    0    0    0    0    0    0    0    0     0
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to