In that case, you should be looking at a relational inner join, perhaps with SQLite (see package sqldf). --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity.
c char <charlie.hsia...@gmail.com> wrote: >Thanks a lot. >Still looking for some super fast and memory efficient solution, as the >matrix I have in real world has billions of rows. > > >On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap <wdun...@tibco.com> >wrote: > >> I haven't looked at the size-time relationship, but im2 (below) is >faster >> than your >> function on at least one example: >> >> intersectMat <- function(mat1, mat2) >> { >> #mat1 and mat2 are both deduplicated >> nr1 <- nrow(mat1) >> nr2 <- nrow(mat2) >> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , >> drop=FALSE] >> } >> >> im2 <- function(mat1, mat2) >> { >> stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2)) >> toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], >> twoColMat[,2]) >> mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE] >> } >> >> > m1 <- cbind(1:1e7, rep(1:10, len=1e7)) >> > m2 <- cbind(1:1e7, rep(1:20, len=1e7)) >> > system.time(r1 <- intersectMat(m1,m2)) >> user system elapsed >> 430.37 1.96 433.98 >> > system.time(r2 <- im2(m1,m2)) >> user system elapsed >> 27.89 0.20 28.13 >> > identical(r1, r2) >> [1] TRUE >> > dim(r1) >> [1] 5000000 2 >> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >> >> >> > -----Original Message----- >> > From: r-help-boun...@r-project.org >[mailto:r-help-boun...@r-project.org] >> On Behalf >> > Of c char >> > Sent: Monday, July 29, 2013 4:04 PM >> > To: r-help@r-project.org >> > Subject: [R] Intersecting two matrices >> > >> > Dear all, >> > >> > I am interested to know a faster matrix intersection package for R >> handles >> > intersection of two integer matrices with ncol=2. Currently I am >using my >> > homemade code adapted from a previous thread: >> > >> > >> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both >> > deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2) >> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]} >> > >> > >> > which handles: >> > size A= 10578373 >> > size B= 9519807 >> > expected intersecting time= 251.2272 >> > intersecting for corssing MPRs took 409.602 seconds. >> > >> > scale a little bit worse than linearly but atomic operation is not >good. >> > Wonder if a super fast C/C++ extension exists for this task. Your >ideas >> are >> > appreciated. >> > >> > Thanks! >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.