Thanks a lot. Still looking for some super fast and memory efficient solution, as the matrix I have in real world has billions of rows.
On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap <wdun...@tibco.com> wrote: > I haven't looked at the size-time relationship, but im2 (below) is faster > than your > function on at least one example: > > intersectMat <- function(mat1, mat2) > { > #mat1 and mat2 are both deduplicated > nr1 <- nrow(mat1) > nr2 <- nrow(mat2) > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , > drop=FALSE] > } > > im2 <- function(mat1, mat2) > { > stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2)) > toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], > twoColMat[,2]) > mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE] > } > > > m1 <- cbind(1:1e7, rep(1:10, len=1e7)) > > m2 <- cbind(1:1e7, rep(1:20, len=1e7)) > > system.time(r1 <- intersectMat(m1,m2)) > user system elapsed > 430.37 1.96 433.98 > > system.time(r2 <- im2(m1,m2)) > user system elapsed > 27.89 0.20 28.13 > > identical(r1, r2) > [1] TRUE > > dim(r1) > [1] 5000000 2 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -----Original Message----- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf > > Of c char > > Sent: Monday, July 29, 2013 4:04 PM > > To: r-help@r-project.org > > Subject: [R] Intersecting two matrices > > > > Dear all, > > > > I am interested to know a faster matrix intersection package for R > handles > > intersection of two integer matrices with ncol=2. Currently I am using my > > homemade code adapted from a previous thread: > > > > > > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both > > deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2) > > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]} > > > > > > which handles: > > size A= 10578373 > > size B= 9519807 > > expected intersecting time= 251.2272 > > intersecting for corssing MPRs took 409.602 seconds. > > > > scale a little bit worse than linearly but atomic operation is not good. > > Wonder if a super fast C/C++ extension exists for this task. Your ideas > are > > appreciated. > > > > Thanks! > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.