Another approach is:
n <- 20
set.seed(2)
x <- as.data.frame(matrix(sample(1:2, n*6, TRUE), nrow = n))
x.col <- c(1, 3, 5)
values <- do.call(paste, c(x[x.col], sep = "\r"))
out <- lapply(seq_along(ind), function (i) {
ind <- which(values == values[i])
ind[!ind %in% i]
})
out
Best,
Dimitris
joris meys wrote:
Neat piece of code, Jim, but it still uses a nested loop. If you order
the matrix first, you only need one passage through the whole matrix
to find the information you need.
Off course I don't take into account the ordering. If the ordering
algorithm doesn't work in linear time, then it doesn't really matter I
guess. The limiting step would become the ordering algorithm.
Kind regards
Joris
On Thu, Oct 8, 2009 at 2:24 PM, jim holtman <jholt...@gmail.com> wrote:
I answered the wrong question. Here is the code to find all the
matches for each row:
n <- 20
set.seed(2)
# create test dataframe
x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
x
x.col <- c(1,3,5)
# match against all the other rows
x.match1 <- apply(x[, x.col], 1, function(a){
.mat <- which(apply(x[, x.col], 1, function(z){
all(a == z)
}))
})
# remove matches to itself
x.match2 <- lapply(seq(length(x.match1)), function(z){
x.match1[[z]][!(x.match1[[z]] %in% z)]
})
# x.match2 contains which rows indices match
On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <r...@alum.mit.edu> wrote:
Hi Friends,
I have a data frame d. Let vars be the column indices for a subset of the
columns in d (e.g., vars <- c(1,3,4,8))
For each row r in d, I want to collect all the other rows in d that match
the values in row r for just the columns in vars.
The naive way to do this is to have a for loop stepping through each row in
d, and within the loop have another loop going through all the rows again,
checking for equality. This is quadratic in the number of rows and takes way
too long. Is there a better, "vectorized" way to do this?
Thanks in advance!
Rama Ramakrishnan
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.