Thanks a lot.
Still looking for some super fast and memory efficient solution, as the
matrix I have in real world has billions of rows.


On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap <wdun...@tibco.com> wrote:

> I haven't looked at the size-time relationship, but im2 (below) is faster
> than your
> function on at least one example:
>
> intersectMat <- function(mat1, mat2)
> {
>     #mat1 and mat2 are both deduplicated
>     nr1 <- nrow(mat1)
>     nr2 <- nrow(mat2)
>     mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
> drop=FALSE]
> }
>
> im2 <- function(mat1, mat2)
> {
>     stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
>     toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1],
> twoColMat[,2])
>     mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE]
> }
>
> > m1 <- cbind(1:1e7, rep(1:10, len=1e7))
> > m2 <- cbind(1:1e7, rep(1:20, len=1e7))
> > system.time(r1 <- intersectMat(m1,m2))
>    user  system elapsed
>  430.37    1.96  433.98
> > system.time(r2 <- im2(m1,m2))
>    user  system elapsed
>   27.89    0.20   28.13
> > identical(r1, r2)
> [1] TRUE
> > dim(r1)
> [1] 5000000       2
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of c char
> > Sent: Monday, July 29, 2013 4:04 PM
> > To: r-help@r-project.org
> > Subject: [R] Intersecting two matrices
> >
> > Dear all,
> >
> > I am interested to know a faster matrix intersection package for R
> handles
> > intersection of two integer matrices with ncol=2. Currently I am using my
> > homemade code adapted from a previous thread:
> >
> >
> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
> > deduplicated  nr1 <- nrow(mat1)  nr2 <- nrow(mat2)
> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
> >
> >
> > which handles:
> > size A= 10578373
> > size B= 9519807
> > expected intersecting time= 251.2272
> > intersecting for corssing MPRs took 409.602 seconds.
> >
> > scale a little bit worse than linearly but atomic operation is not good.
> > Wonder if a super fast C/C++ extension exists for this task. Your ideas
> are
> > appreciated.
> >
> > Thanks!
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to