Re: [R] Programming R to avoid loops

Charles C. Berry Sat, 18 Apr 2015 10:50:53 -0700

On Sat, 18 Apr 2015, Brant Inman wrote:

I have two large data frames with the following structure:

df1

 id       date test1.result
1  a 2009-08-28      1
2  a 2009-09-16      1
3  b 2008-08-06      0
4  c 2012-02-02      1
5  c 2010-08-03      1
6  c 2012-08-02      0

df2

 id       date test2.result
1  a 2011-02-03      1
2  b 2011-09-27      0
3  b 2011-09-01      1
4  c 2009-07-16      0
5  c 2009-04-15      0
6  c 2010-08-10      1

I need to match items in df2 to those in df1 with specific matchingcriteria. I have written a looped matching algorithm that works, but itis very slow with my large datasets. I am requesting help on making aversion of this code that is faster and “vectorized" so to speak.

As I see in your posted code, you match id's exactly, dates according to arange, and count the number of positive test result in the seconddata.frame.

For this, the countOverlaps() function of the GenomicRanges package willdo the trick with suitably defined GRanges objects. Something like:


require(GenomicRanges)

date1 <- as.integer( as.Date( df1$date, "%Y-%m-%d" ))
date2 <- as.integer( as.Date( df2$date, "%Y-%m-%d" ))

lagdays <- 30L
predays <- -30L

gr1 <- GRanges(seqnames=df1$id, IRanges(start=date1,width=1),strand="*")

gr2 <- GRanges(seqnames=df2$id,
               IRanges(start=date2+predays,end=date2+lagdays),
               strand="*")[ df2$test2.result==1,]

df1$test2.count <- countOverlaps(gr1,gr2)


For the example data.frames (as rendered by Jim Lemon's code), this yields

df1

  id       date test1.result test2.count
1  a 2009-08-28            1           0
2  a 2009-09-16            1           0
3  b 2008-08-06            0           0
4  c 2012-02-02            1           0
5  c 2010-08-03            1           1
6  c 2012-08-02            0           0

The GenomicRanges package is at

http://www.bioconductor.org/packages/release/bioc/html/GenomicRanges.html

where you will find installation instructions and links to vignettes.

HTH,

Chuck
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Programming R to avoid loops

Reply via email to