Using base R you can solve this by doing some sorting and comparing the first and last dates in each id-value group. Computing the last and last dates can be vectorized.
f1 <- function(data) { # sort by id, break ties with value, break remaining ties with date sortedData <- data[with(data, order(id, value, date)), ] i <- seq_len(NROW(sortedData)-1) # a 'group' has same id and value, entries in group are sorted by date isBreakPoint <- with(sortedData, id[i]!=id[i+1] | value[i]!=value[i+1]) isFirstInGroup <- c(TRUE, isBreakPoint) isLastInGroup <- c(isBreakPoint, TRUE) sortedData[isFirstInGroup,][sortedData[isLastInGroup,"date"] - sortedData[isFirstInGroup,"date"] >= 31,] } dat <- read.table(colClasses=c("character", "Date", "character"), header=TRUE, text= "id date value a 2000-01-01 x a 2000-03-01 x b 2000-11-11 w c 2000-11-11 y c 2000-10-01 y c 2000-09-10 y c 2000-12-12 z c 2000-10-11 z d 2000-11-11 w d 2000-11-10 w") > f1(dat) id date value 1 a 2000-01-01 x 6 c 2000-09-10 y 8 c 2000-10-11 z Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jul 16, 2014 at 7:49 AM, arun <smartpink...@yahoo.com> wrote: > Hi, > If `dat` is the dataset > > library(dplyr) > dat%>% > group_by(id,value)%>% > > arrange(date=as.Date(date))%>% > filter(any(c(abs(diff(as.Date(date))),NA)>31)& date == min(date)) > #Source: local data frame [3 x 3] > #Groups: id, value > # > # id date value > #1 a 2000-01-01 x > #2 c 2000-09-10 y > #3 c 2000-10-11 z > A.K. > > > > > On Wednesday, July 16, 2014 9:10 AM, Williams Scott > <scott.willi...@petermac.org> wrote: > Hi R experts, > > I have a dataset as sampled below. Values are only regarded as Œconfirmed¹ > in an individual (Œid¹) if they occur > more than once at least 30 days apart. > > > id date value > a 2000-01-01 x > a 2000-03-01 x > b 2000-11-11 w > c 2000-11-11 y > c 2000-10-01 y > c 2000-09-10 y > c 2000-12-12 z > c 2000-10-11 z > d 2000-11-11 w > d 2000-11-10 w > > > I wish to subset the data to retain rows where the value for the > individual is confirmed more than 30 days apart. So, after deleting all > rows with just one occurrence of id and value, the rest would be the > earliest occurrence of each value in each case id, provided 31 or more > days exist between the dates. If >1 value is present per id, each value > level needs to be assessed independently. This example would then reduce > to: > > > id date value > a 2000-01-01 x > c 2000-09-10 y > c 2000-10-11 z > > > > I can do this via some crude loops and subsetting, but I am looking for as > much efficiency as possible > as the dataset has around 50 million rows to assess. Any suggestions > welcomed. > > Thanks in advance > > Scott Williams MD > Melbourne, Australia > > > > This email (including any attachments or links) may contain > confidential and/or legally privileged information and is > intended only to be read or used by the addressee. If you > are not the intended addressee, any use, distribution, > disclosure or copying of this email is strictly > prohibited. > Confidentiality and legal privilege attached to this email > (including any attachments) are not waived or lost by > reason of its mistaken delivery to you. > If you have received this email in error, please delete it > and notify us immediately by telephone or email. Peter > MacCallum Cancer Centre provides no guarantee that this > transmission is free of virus or that it has not been > intercepted or altered and will not be liable for any delay > in its receipt. > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.