Jim Mankin liked your message with Boxer. On April 18, 2015 at 10:48:17 AM MST, Charles C. Berry <ccbe...@ucsd.edu> wrote:On Sat, 18 Apr 2015, Brant Inman wrote:> I have two large data frames with the following structure:>>> df1> id date test1.result> 1 a 2009-08-28 1> 2 a 2009-09-16 1> 3 b 2008-08-06 0> 4 c 2012-02-02 1> 5 c 2010-08-03 1> 6 c 2012-08-02 0>>> df2> id date test2.result> 1 a 2011-02-03 1> 2 b 2011-09-27 0> 3 b 2011-09-01 1> 4 c 2009-07-16 0> 5 c 2009-04-15 0> 6 c 2010-08-10 1>> I need to match items in df2 to those in df1 with specific matching > criteria. I have written a looped matching algorithm that works, but it > is very slow with my large datasets. I am requesting help on making a > version of this code that is faster and “vectorized" so to speak.As I see in your posted code, you match id's exactly, dates according to a range, and count the number of positive test result in the second data.frame.For this, the countOverlaps() function of the GenomicRanges package will do the trick with suitably defined GRanges objects. Something like:require(GenomicRanges)date1 date2 lagdays predays gr1 gr2 IRanges(start=date2+predays,end=date2+lagdays), strand="*")[ df2$test2.result==1,]df1$test2.count For the example data.frames (as rendered by Jim Lemon's code), this yields> df1 id date test1.result test2.count1 a 2009-08-28 1 02 a 2009-09-16 1 03 b 2008-08-06 0 04 c 2012-02-02 1 05 c 2010-08-03 1 16 c 2012-08-02 0 0The GenomicRanges package is athttp://www.bioconductor.org/packages/release/bioc/html/GenomicRanges.htmlwhere you will find installation instructions and links to vignettes.HTH,chuck______________________________________________r-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, seehttps://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.