There are 10 overlaps in the data: > df1<-data.frame(start=as.POSIXct(paste('2011-06-01 ',1:20,':00',sep='')), + end=as.POSIXct(paste('2011-06-01 ',1:20,':30',sep=''))) > df2<-data.frame(start=as.POSIXct(paste('2011-06-01 + ',rep(seq(1,20,2),2),':',sample(1:19,20,replace=T),sep='')), + end=as.POSIXct(paste('2011-06-01 + ',rep(seq(1,20,2),2),':',sample(20:50,20),sep=''))) > > # create a matrix where the 'start' adds 1 to a count and the 'end' subtracts > 1 > # the second column is the df# and the 4th is the row number of the data > > x <- rbind( + cbind(df1$start, 1, 1, seq(nrow(df1))), + cbind(df1$end, 1, -1, seq(nrow(df1))), + cbind(df2$start, 2, 1, seq(nrow(df2))), + cbind(df2$end, 2, -1, seq(nrow(df2))) + ) > # sort by time > x <- x[order(x[,1]),] > # add the queue count; this is the number of items in a queue which is > # used to determine any overlaps if the queue is greater than one > x <- cbind(x, count = cumsum(x[,3])) > # split the data into group when the count == 0 > indx <- split(seq(nrow(x)), cumsum(c(FALSE, head(x[, 'count'], -1) == 0))) > # keep groups of length > 2; there are the overlaps > indx <- indx[sapply(indx, length) > 2] > # get unique df# and row indices > lapply(indx, function(a){ + unique(paste(x[a, 2], x[a, 4], sep = ' - ')) + }) $`0` [1] "1 - 1" "2 - 11" "2 - 1"
$`2` [1] "1 - 3" "2 - 12" "2 - 2" $`4` [1] "1 - 5" "2 - 13" "2 - 3" $`6` [1] "1 - 7" "2 - 14" "2 - 4" $`8` [1] "1 - 9" "2 - 15" "2 - 5" $`10` [1] "1 - 11" "2 - 16" "2 - 6" $`12` [1] "1 - 13" "2 - 17" "2 - 7" $`14` [1] "1 - 15" "2 - 8" "2 - 18" $`16` [1] "1 - 17" "2 - 19" "2 - 9" $`18` [1] "1 - 19" "2 - 20" "2 - 10" On Tue, Aug 30, 2011 at 2:15 PM, Justin Haynes <jto...@gmail.com> wrote: > Hiya, > > maybe there is a native R function for this and if so please let me know! > > I have 2 data.frames with start and end dates, they read in as strings and I > am converting to POSIXct. How can I check for overlap? > > The end result ideally will be a single data.frame containing all the > columns of the other two with rows where there were date overlaps. > > > df1<-data.frame(start=as.POSIXct(paste('2011-06-01 ',1:20,':00',sep='')), > end=as.POSIXct(paste('2011-06-01 ',1:20,':30',sep=''))) > df2<-data.frame(start=as.POSIXct(paste('2011-06-01 > ',rep(seq(1,20,2),2),':',sample(1:19,20,replace=T),sep='')), > end=as.POSIXct(paste('2011-06-01 > ',rep(seq(1,20,2),2),':',sample(20:50,20),sep=''))) > > I tried: > library(lubridate) > > df1$interval<-new_interval(df1$start,df1$end) > >> df1$interval[1] > [1] 2011-06-01 01:00:00 -- 2011-06-01 01:30:00 >> df2$start[1] > [1] "2011-06-01 01:17:00 PDT" > > but > >> df2$start[1] %in% df1$interval[1] > [1] FALSE >> > > This must be fairly straight forward and I just don't know where to look! > > > Thanks, > Justin > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.