Re: [R] Comparing matrices in R - matrixB %in% matrixA

Jeff Newmiller Fri, 31 Oct 2014 08:30:02 -0700

Since both of you seem to have misinterpreted my response, consider thefollowing for clarification:

A <- matrix(1:1000, 1000, 10)
B <- A[1:100, ]
# my recommended solution
t1 <- system.time({match(as.data.frame(t(B)), as.data.frame(t(A)))})
# similar to John's recommended solution
t2 <- system.time({

+   AA <- as.list(as.data.frame(t(A)))
+   BB <- as.list(as.data.frame(t(B)))
+   which( AA %in% BB )
+ })

t3 <- system.time({

+   lresult <- rep( NA, nrow(A) )
+   for ( ia in seq.int( nrow( A ) ) ) {
+     lres <- FALSE
+     ib <- 0
+     while ( ib < nrow( B ) & !lres ) {
+       ib <- ib + 1
+       lres <- all( A[ ia, ] == B[ ib, ] )
+     }
+     lresult[ ia ] <- lres
+   }
+   which( lresult )
+ })

t4 <- system.time({

+   res<-c()
+   rowsB = length(B[,1])
+   rowsA = length(A[,1])
+   colsB = length(B[1,])
+   colsA = length(A[1,])
+   for (i in 1:rowsB){
+     for (j in 1:colsB){
+       for (k in 1:rowsA){
+         for (l in 1:colsA){
+           if(A[k,l]==B[i,j]){res<-c(res,k)}
+         }
+       }
+     }
+   }
+   unique(sort(res))
+ })

t1

   user  system elapsed
  0.022   0.000   0.020

t2

   user  system elapsed
   0.02    0.00    0.02

t3

   user  system elapsed
  0.748   0.000   0.746

t4

   user  system elapsed
 16.612   0.016  16.636

# data.frames are lists, but applying as.list seems to speed up the# match for some reason
t2[1]/t1[1]

user.self
0.9090909

# intended comparison for learning purposes
t4[1]/t3[1]

user.self
 22.20856

I recognize that the reference implementation does not need to beoptimized, but the changes I suggested to it illustrate an incrementalimprovement toward "thinking in R" rather than the optimal solution.


On Fri, 31 Oct 2014, John Fox wrote:

Dear Jeff,

For curiosity, I compared your solution with the one I posted earlier this 
morning (when I was working on a slower computer, accounting for the somewhat 
different timings for my solution):

------------ snip ----------

A <- matrix(1:10000, 10000, 10)
B <- A[1:1000, ]

system.time({

+    AA <- as.list(as.data.frame(t(A)))
+    BB <- as.list(as.data.frame(t(B)))
+    print(sum(AA %in% BB))
+  })
[1] 1000
  user  system elapsed
  0.14    0.01    0.16



system.time({

+     lresult <- rep( NA, nrow(A) )
+     for ( ia in seq.int( nrow( A ) ) ) {
+         lres <- FALSE
+         ib <- 0
+         while ( ib < nrow( B ) & !lres ) {
+             ib <- ib + 1
+             lres <- all( A[ ia, ] == B[ ib, ] )
+         }
+         lresult[ ia ] <- lres
+     }
+     print(sum( lresult ))
+ })
[1] 1000
  user  system elapsed
 45.76    0.01   45.77

46/0.16

[1] 287.5

------------ snip ----------

So the solution using nested loops is more than 2 orders of magnitude slower 
for this problem. Of course, for a one-off problem, depending on its size, the 
difference may not matter.

Best,
John

-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Jeff Newmiller
Sent: Friday, October 31, 2014 10:15 AM
To: Charles Novaes de Santana; r-help@r-project.org
Subject: Re: [R] Comparing matrices in R - matrixB %in% matrixA

Thank you for the reproducible example, but posting in HTML can corrupt
your example code so please learn to set your email client mail format
appropriately when posting to this list.

I think this [1] post, found with a quick Google search for "R match
matrix", fits your situation perfectly.

match(data.frame(t(B)), data.frame(t(A)))

Note that concatenating vectors in loops is bad news... a basic
optimization for your code would be to preallocate a logical result
vector and fill in each element with a TRUE/FALSE in the outer loop,
and use the which() function on that completed vector to identify the
index numbers (if you really need that). For example:

lresult <- rep( NA, nrow(A) )
for ( ia in seq.int( nrow( A ) ) ) {
  lres <- FALSE
  ib <- 0
  while ( ib < nrow( B ) & !lres ) {
    ib <- ib + 1
    lres <- all( A[ ia, ] == B[ ib, ] )
  }
  lresult[ ia ] <- lres
}
result <- which( lresult )

[1] http://stackoverflow.com/questions/12697122/in-r-match-function-
for-rows-or-columns-of-matrix
-----------------------------------------------------------------------
----
Jeff Newmiller                        The     .....       .....  Go
Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..
Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.
rocks...1k
-----------------------------------------------------------------------
----
Sent from my phone. Please excuse my brevity.

On October 31, 2014 6:20:38 AM PDT, Charles Novaes de Santana
<charles.sant...@gmail.com> wrote:

My apologies, because I sent the message before finishing it. i am

very

sorry about this. Please find below my message (I use to write the
messages
from the end to the beginning... sorry :)).

Dear all,

I am trying to compare two matrices, in order to find in which rows of
a
matrix A I can find the same values as in matrix B. I am trying to do
it
for matrices with around 2500 elements, but please find below a toy
example:

A = matrix(1:10,nrow=5)
B = A[-c(1,2,3),];

So

    [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

and

    [,1] [,2]
[1,]    4    9
[2,]    5   10

I would like to compare A and B in order to find in which rows of A I
can
find the  rows of B. Something similar to %in% with one dimensional
arrays.
In the example above, the answer should be 4 and 5.

I did a function to do it (see it below), it gives me the correct
answer
for this toy example, but the excess of for-loops makes it extremely
slow
for larger matrices. I was wondering if there is a better way to do
this
kind of comparison. Any idea? Sorry if it is a stupid question.

matbinmata<-function(B,A){
   res<-c();
   rowsB = length(B[,1]);
   rowsA = length(A[,1]);
   colsB = length(B[1,]);
   colsA = length(A[1,]);
   for (i in 1:rowsB){
       for (j in 1:colsB){
           for (k in 1:rowsA){
               for (l in 1:colsA){
                   if(A[k,l]==B[i,j]){res<-c(res,k);}
               }
           }
       }
   }
   return(unique(sort(res)));
}


Best,

Charles

On Fri, Oct 31, 2014 at 2:12 PM, Charles Novaes de Santana <
charles.sant...@gmail.com> wrote:

A = matrix(1:10,nrow=5)
B = A[-c(1,2,3),];

So

     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

and

     [,1] [,2]
[1,]    4    9
[2,]    5   10

I would like to compare A and B in order to find in which rows of A

can

find the  rows of B. Something similar to %in% with one dimensional

arrays.

In the example above, the answer should be 4 and 5.

I did a function to do it (see it below), it gives me the correct

answer

for this toy example, but the excess of for-loops makes it extremely

slow

for larger matrices. I was wondering if there is a better way to do

this

kind of comparison. Any idea? Sorry if it is a stupid question.

matbinmata<-function(B,A){
    res<-c();
    rowsB = length(B[,1]);
    rowsA = length(A[,1]);
    colsB = length(B[1,]);
    colsA = length(A[1,]);
    for (i in 1:rowsB){
        for (j in 1:colsB){
            for (k in 1:rowsA){
                for (l in 1:colsA){
                    if(A[k,l]==B[i,j]){res<-c(res,k);}
                }
            }
        }
    }
    return(unique(sort(res)));
}


Best,

Charles


--
Um ax?! :)

--
Charles Novaes de Santana, PhD
http://www.imedea.uib-csic.es/~charles




--
Um ax?! :)

--
Charles Novaes de Santana, PhD
http://www.imedea.uib-csic.es/~charles

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing matrices in R - matrixB %in% matrixA

Reply via email to