Here is one way of doing it: > n <- 20 > set.seed(2) > # create test dataframe > x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n)) > x V1 V2 V3 V4 V5 V6 1 1 2 2 2 1 1 2 2 1 1 2 2 1 3 2 2 1 2 1 2 4 1 1 1 1 1 2 5 2 1 2 2 1 1 6 2 1 2 1 2 2 7 1 1 2 1 2 2 8 2 1 1 1 1 1 9 1 2 2 1 2 1 10 2 1 2 1 1 1 11 2 1 1 1 2 1 12 1 1 1 1 1 2 13 2 2 2 1 1 1 14 1 2 2 1 2 2 15 1 2 1 1 1 2 16 2 2 2 2 1 2 17 2 2 2 1 1 2 18 1 1 2 2 1 1 19 1 2 2 1 1 2 20 1 1 2 2 1 2 > x.col <- c(1,3,5) > # find matching columns by testing the first against all others > x.match <- x[, x.col[1]] == x[, x.col[-1]] > # print them out > x[apply(x.match, 1, all),] V1 V2 V3 V4 V5 V6 4 1 1 1 1 1 2 6 2 1 2 1 2 2 12 1 1 1 1 1 2 15 1 2 1 1 1 2 > > >
On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <[email protected]> wrote: > > Hi Friends, > > I have a data frame d. Let vars be the column indices for a subset of the > columns in d (e.g., vars <- c(1,3,4,8)) > > For each row r in d, I want to collect all the other rows in d that match > the values in row r for just the columns in vars. > > The naive way to do this is to have a for loop stepping through each row in > d, and within the loop have another loop going through all the rows again, > checking for equality. This is quadratic in the number of rows and takes way > too long. Is there a better, "vectorized" way to do this? > > Thanks in advance! > > Rama Ramakrishnan > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

