Since both of you seem to have misinterpreted my response, consider the
following for clarification:
A <- matrix(1:1000, 1000, 10)
B <- A[1:100, ]
# my recommended solution
t1 <- system.time({match(as.data.frame(t(B)), as.data.frame(t(A)))})
# similar to John's recommended solution
t2 <- system.time({
+ AA <- as.list(as.data.frame(t(A)))
+ BB <- as.list(as.data.frame(t(B)))
+ which( AA %in% BB )
+ })
t3 <- system.time({
+ lresult <- rep( NA, nrow(A) )
+ for ( ia in seq.int( nrow( A ) ) ) {
+ lres <- FALSE
+ ib <- 0
+ while ( ib < nrow( B ) & !lres ) {
+ ib <- ib + 1
+ lres <- all( A[ ia, ] == B[ ib, ] )
+ }
+ lresult[ ia ] <- lres
+ }
+ which( lresult )
+ })
t4 <- system.time({
+ res<-c()
+ rowsB = length(B[,1])
+ rowsA = length(A[,1])
+ colsB = length(B[1,])
+ colsA = length(A[1,])
+ for (i in 1:rowsB){
+ for (j in 1:colsB){
+ for (k in 1:rowsA){
+ for (l in 1:colsA){
+ if(A[k,l]==B[i,j]){res<-c(res,k)}
+ }
+ }
+ }
+ }
+ unique(sort(res))
+ })
t1
user system elapsed
0.022 0.000 0.020
t2
user system elapsed
0.02 0.00 0.02
t3
user system elapsed
0.748 0.000 0.746
t4
user system elapsed
16.612 0.016 16.636
# data.frames are lists, but applying as.list seems to speed up the
# match for some reason
t2[1]/t1[1]
user.self
0.9090909
# intended comparison for learning purposes
t4[1]/t3[1]
user.self
22.20856
I recognize that the reference implementation does not need to be
optimized, but the changes I suggested to it illustrate an incremental
improvement toward "thinking in R" rather than the optimal solution.
On Fri, 31 Oct 2014, John Fox wrote:
Dear Jeff,
For curiosity, I compared your solution with the one I posted earlier this
morning (when I was working on a slower computer, accounting for the somewhat
different timings for my solution):
------------ snip ----------
A <- matrix(1:10000, 10000, 10)
B <- A[1:1000, ]
system.time({
+ AA <- as.list(as.data.frame(t(A)))
+ BB <- as.list(as.data.frame(t(B)))
+ print(sum(AA %in% BB))
+ })
[1] 1000
user system elapsed
0.14 0.01 0.16
system.time({
+ lresult <- rep( NA, nrow(A) )
+ for ( ia in seq.int( nrow( A ) ) ) {
+ lres <- FALSE
+ ib <- 0
+ while ( ib < nrow( B ) & !lres ) {
+ ib <- ib + 1
+ lres <- all( A[ ia, ] == B[ ib, ] )
+ }
+ lresult[ ia ] <- lres
+ }
+ print(sum( lresult ))
+ })
[1] 1000
user system elapsed
45.76 0.01 45.77
46/0.16
[1] 287.5
------------ snip ----------
So the solution using nested loops is more than 2 orders of magnitude slower
for this problem. Of course, for a one-off problem, depending on its size, the
difference may not matter.
Best,
John
-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/
-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Jeff Newmiller
Sent: Friday, October 31, 2014 10:15 AM
To: Charles Novaes de Santana; r-help@r-project.org
Subject: Re: [R] Comparing matrices in R - matrixB %in% matrixA
Thank you for the reproducible example, but posting in HTML can corrupt
your example code so please learn to set your email client mail format
appropriately when posting to this list.
I think this [1] post, found with a quick Google search for "R match
matrix", fits your situation perfectly.
match(data.frame(t(B)), data.frame(t(A)))
Note that concatenating vectors in loops is bad news... a basic
optimization for your code would be to preallocate a logical result
vector and fill in each element with a TRUE/FALSE in the outer loop,
and use the which() function on that completed vector to identify the
index numbers (if you really need that). For example:
lresult <- rep( NA, nrow(A) )
for ( ia in seq.int( nrow( A ) ) ) {
lres <- FALSE
ib <- 0
while ( ib < nrow( B ) & !lres ) {
ib <- ib + 1
lres <- all( A[ ia, ] == B[ ib, ] )
}
lresult[ ia ] <- lres
}
result <- which( lresult )
[1] http://stackoverflow.com/questions/12697122/in-r-match-function-
for-rows-or-columns-of-matrix
-----------------------------------------------------------------------
----
Jeff Newmiller The ..... ..... Go
Live...
DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#..
Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#.
rocks...1k
-----------------------------------------------------------------------
----
Sent from my phone. Please excuse my brevity.
On October 31, 2014 6:20:38 AM PDT, Charles Novaes de Santana
<charles.sant...@gmail.com> wrote:
My apologies, because I sent the message before finishing it. i am
very
sorry about this. Please find below my message (I use to write the
messages
from the end to the beginning... sorry :)).
Dear all,
I am trying to compare two matrices, in order to find in which rows of
a
matrix A I can find the same values as in matrix B. I am trying to do
it
for matrices with around 2500 elements, but please find below a toy
example:
A = matrix(1:10,nrow=5)
B = A[-c(1,2,3),];
So
A
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
and
B
[,1] [,2]
[1,] 4 9
[2,] 5 10
I would like to compare A and B in order to find in which rows of A I
can
find the rows of B. Something similar to %in% with one dimensional
arrays.
In the example above, the answer should be 4 and 5.
I did a function to do it (see it below), it gives me the correct
answer
for this toy example, but the excess of for-loops makes it extremely
slow
for larger matrices. I was wondering if there is a better way to do
this
kind of comparison. Any idea? Sorry if it is a stupid question.
matbinmata<-function(B,A){
res<-c();
rowsB = length(B[,1]);
rowsA = length(A[,1]);
colsB = length(B[1,]);
colsA = length(A[1,]);
for (i in 1:rowsB){
for (j in 1:colsB){
for (k in 1:rowsA){
for (l in 1:colsA){
if(A[k,l]==B[i,j]){res<-c(res,k);}
}
}
}
}
return(unique(sort(res)));
}
Best,
Charles
On Fri, Oct 31, 2014 at 2:12 PM, Charles Novaes de Santana <
charles.sant...@gmail.com> wrote:
A = matrix(1:10,nrow=5)
B = A[-c(1,2,3),];
So
A
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
and
B
[,1] [,2]
[1,] 4 9
[2,] 5 10
I would like to compare A and B in order to find in which rows of A
I
can
find the rows of B. Something similar to %in% with one dimensional
arrays.
In the example above, the answer should be 4 and 5.
I did a function to do it (see it below), it gives me the correct
answer
for this toy example, but the excess of for-loops makes it extremely
slow
for larger matrices. I was wondering if there is a better way to do
this
kind of comparison. Any idea? Sorry if it is a stupid question.
matbinmata<-function(B,A){
res<-c();
rowsB = length(B[,1]);
rowsA = length(A[,1]);
colsB = length(B[1,]);
colsA = length(A[1,]);
for (i in 1:rowsB){
for (j in 1:colsB){
for (k in 1:rowsA){
for (l in 1:colsA){
if(A[k,l]==B[i,j]){res<-c(res,k);}
}
}
}
}
return(unique(sort(res)));
}
Best,
Charles
--
Um ax?! :)
--
Charles Novaes de Santana, PhD
http://www.imedea.uib-csic.es/~charles
--
Um ax?! :)
--
Charles Novaes de Santana, PhD
http://www.imedea.uib-csic.es/~charles
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.