I am not familiar with R's sort and sql libs. appreciate if you can post a
code snippet when you got time. Thanks a lot!


On Tue, Jul 30, 2013 at 10:36 AM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us>wrote:

> In that case, you should be looking at a relational inner join, perhaps
> with SQLite (see package sqldf).
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> c char <charlie.hsia...@gmail.com> wrote:
> >Thanks a lot.
> >Still looking for some super fast and memory efficient solution, as the
> >matrix I have in real world has billions of rows.
> >
> >
> >On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap <wdun...@tibco.com>
> >wrote:
> >
> >> I haven't looked at the size-time relationship, but im2 (below) is
> >faster
> >> than your
> >> function on at least one example:
> >>
> >> intersectMat <- function(mat1, mat2)
> >> {
> >>     #mat1 and mat2 are both deduplicated
> >>     nr1 <- nrow(mat1)
> >>     nr2 <- nrow(mat2)
> >>     mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
> >> drop=FALSE]
> >> }
> >>
> >> im2 <- function(mat1, mat2)
> >> {
> >>     stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
> >>     toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1],
> >> twoColMat[,2])
> >>     mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE]
> >> }
> >>
> >> > m1 <- cbind(1:1e7, rep(1:10, len=1e7))
> >> > m2 <- cbind(1:1e7, rep(1:20, len=1e7))
> >> > system.time(r1 <- intersectMat(m1,m2))
> >>    user  system elapsed
> >>  430.37    1.96  433.98
> >> > system.time(r2 <- im2(m1,m2))
> >>    user  system elapsed
> >>   27.89    0.20   28.13
> >> > identical(r1, r2)
> >> [1] TRUE
> >> > dim(r1)
> >> [1] 5000000       2
> >>
> >> Bill Dunlap
> >> Spotfire, TIBCO Software
> >> wdunlap tibco.com
> >>
> >>
> >> > -----Original Message-----
> >> > From: r-help-boun...@r-project.org
> >[mailto:r-help-boun...@r-project.org]
> >> On Behalf
> >> > Of c char
> >> > Sent: Monday, July 29, 2013 4:04 PM
> >> > To: r-help@r-project.org
> >> > Subject: [R] Intersecting two matrices
> >> >
> >> > Dear all,
> >> >
> >> > I am interested to know a faster matrix intersection package for R
> >> handles
> >> > intersection of two integer matrices with ncol=2. Currently I am
> >using my
> >> > homemade code adapted from a previous thread:
> >> >
> >> >
> >> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
> >> > deduplicated  nr1 <- nrow(mat1)  nr2 <- nrow(mat2)
> >> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
> >> >
> >> >
> >> > which handles:
> >> > size A= 10578373
> >> > size B= 9519807
> >> > expected intersecting time= 251.2272
> >> > intersecting for corssing MPRs took 409.602 seconds.
> >> >
> >> > scale a little bit worse than linearly but atomic operation is not
> >good.
> >> > Wonder if a super fast C/C++ extension exists for this task. Your
> >ideas
> >> are
> >> > appreciated.
> >> >
> >> > Thanks!
> >> >
> >> >       [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to