I am not familiar with R's sort and sql libs. appreciate if you can post a code snippet when you got time. Thanks a lot!
On Tue, Jul 30, 2013 at 10:36 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us>wrote: > In that case, you should be looking at a relational inner join, perhaps > with SQLite (see package sqldf). > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > c char <charlie.hsia...@gmail.com> wrote: > >Thanks a lot. > >Still looking for some super fast and memory efficient solution, as the > >matrix I have in real world has billions of rows. > > > > > >On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap <wdun...@tibco.com> > >wrote: > > > >> I haven't looked at the size-time relationship, but im2 (below) is > >faster > >> than your > >> function on at least one example: > >> > >> intersectMat <- function(mat1, mat2) > >> { > >> #mat1 and mat2 are both deduplicated > >> nr1 <- nrow(mat1) > >> nr2 <- nrow(mat2) > >> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , > >> drop=FALSE] > >> } > >> > >> im2 <- function(mat1, mat2) > >> { > >> stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2)) > >> toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], > >> twoColMat[,2]) > >> mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE] > >> } > >> > >> > m1 <- cbind(1:1e7, rep(1:10, len=1e7)) > >> > m2 <- cbind(1:1e7, rep(1:20, len=1e7)) > >> > system.time(r1 <- intersectMat(m1,m2)) > >> user system elapsed > >> 430.37 1.96 433.98 > >> > system.time(r2 <- im2(m1,m2)) > >> user system elapsed > >> 27.89 0.20 28.13 > >> > identical(r1, r2) > >> [1] TRUE > >> > dim(r1) > >> [1] 5000000 2 > >> > >> Bill Dunlap > >> Spotfire, TIBCO Software > >> wdunlap tibco.com > >> > >> > >> > -----Original Message----- > >> > From: r-help-boun...@r-project.org > >[mailto:r-help-boun...@r-project.org] > >> On Behalf > >> > Of c char > >> > Sent: Monday, July 29, 2013 4:04 PM > >> > To: r-help@r-project.org > >> > Subject: [R] Intersecting two matrices > >> > > >> > Dear all, > >> > > >> > I am interested to know a faster matrix intersection package for R > >> handles > >> > intersection of two integer matrices with ncol=2. Currently I am > >using my > >> > homemade code adapted from a previous thread: > >> > > >> > > >> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both > >> > deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2) > >> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]} > >> > > >> > > >> > which handles: > >> > size A= 10578373 > >> > size B= 9519807 > >> > expected intersecting time= 251.2272 > >> > intersecting for corssing MPRs took 409.602 seconds. > >> > > >> > scale a little bit worse than linearly but atomic operation is not > >good. > >> > Wonder if a super fast C/C++ extension exists for this task. Your > >ideas > >> are > >> > appreciated. > >> > > >> > Thanks! > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.