On Wed, Jul 16, 2014 at 8:07 AM, Williams Scott <scott.willi...@petermac.org> wrote: > Hi R experts, > > I have a dataset as sampled below. Values are only regarded as Œconfirmed¹ > in an individual (Œid¹) if they occur > more than once at least 30 days apart. > > > id date value > a 2000-01-01 x > a 2000-03-01 x > b 2000-11-11 w > c 2000-11-11 y > c 2000-10-01 y > c 2000-09-10 y > c 2000-12-12 z > c 2000-10-11 z > d 2000-11-11 w > d 2000-11-10 w > > > I wish to subset the data to retain rows where the value for the > individual is confirmed more than 30 days apart. So, after deleting all > rows with just one occurrence of id and value, the rest would be the > earliest occurrence of each value in each case id, provided 31 or more > days exist between the dates. If >1 value is present per id, each value > level needs to be assessed independently. This example would then reduce > to: > > > id date value > a 2000-01-01 x > c 2000-09-10 y > c 2000-10-11 z
Question: the c-y id-value pair occurs 3 times. In two cases (2000-11-11 vs. 2000-10-01 & 2000-11-11 vs 2000-09-01) the difference is >30 days. Why isn't c 2000-10-01 y also part of the result? Is it because you only want a single id-value pair in which the date is the minimal? Or you want the one in which the date difference is maximal? Or you overlooked that particular match? I can't figure it out from your description. > > > > I can do this via some crude loops and subsetting, but I am looking for as > much efficiency as possible > as the dataset has around 50 million rows to assess. Any suggestions > welcomed. Hum, is the source of this data in a relational database such as Oracle, PostgreSQL, MySQL, MS-SQL, or SQLite (or "other")? I ask because some of this processing might be easier do to in the data base using a "self join", instead of reading the entire relational table into a data.frame and doing it in R. > > Thanks in advance > > Scott Williams MD > Melbourne, Australia > -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.