Hi, Try this: dat1 <- read.table(text=" PT_ID IDX_DT OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY 13 4549 2002-08-21 2002-08-20 -1 183 2 14 4549 2002-08-21 2002-11-14 85 91 1 15 4549 2002-08-21 2003-02-18 181 89 1 16 4549 2002-08-21 2003-05-15 267 109 2 17 4549 2002-08-21 2003-12-16 482 96 1 128 4839 2006-11-28 2006-11-28 0 179 2 ", header=TRUE) dat3<-aggregate(DAYS_DIFF~PT_ID,data=dat1,min) merge(dat1,dat3) # PT_ID DAYS_DIFF IDX_DT OBS_DATE OBS_VALUE CATEGORY #1 4549 -1 2002-08-21 2002-08-20 183 2 #2 4839 0 2006-11-28 2006-11-28 179 2
#or, dat2<- tapply(dat1$DAYS_DIFF,dat1$PT_ID,min) dat4<-data.frame(PT_ID=row.names(data.frame(dat2)),DAYS_DIFF=dat2) row.names(dat4)<-1:nrow(dat4) merge(dat1,dat4) # PT_ID DAYS_DIFF IDX_DT OBS_DATE OBS_VALUE CATEGORY #1 4549 -1 2002-08-21 2002-08-20 183 2 #2 4839 0 2006-11-28 2006-11-28 179 2 A.K. ----- Original Message ----- From: WANG WEIJIA <wwang....@gmail.com> To: "r-help@R-project.org" <r-help@r-project.org> Cc: Sent: Saturday, September 1, 2012 1:10 PM Subject: [R] R_closest date Hi, I have encountered an issue about finding a date closest to another date So this is how the data frame looks like: PT_ID IDX_DT OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY 13 4549 2002-08-21 2002-08-20 -1 183 2 14 4549 2002-08-21 2002-11-14 85 91 1 15 4549 2002-08-21 2003-02-18 181 89 1 16 4549 2002-08-21 2003-05-15 267 109 2 17 4549 2002-08-21 2003-12-16 482 96 1 128 4839 2006-11-28 2006-11-28 0 179 2 I need to find, the single observation, which has the closest date of 'OBS_DATE' to 'IDX_DT'. For example, for 'PT_ID' of 4549, I need row 13, of which the OBS_DATE is just one day away from IDX_DT. I was thinking about using abs(), and I got this: baseline<- function(x){ + + #remove all uncessary variables + baseline<- x[,c("PT_ID","DAYS_DIFF")] + + #get a list of every unique ID + uniqueID <- unique(baseline$PT_ID) + + #make a vector that will contain the smallest DAYS_DIFF + first <- rep(-99,length(uniqueID)) + + i = 1 + #loop through each unique ID + for (PT_ID in uniqueID){ + + #for each iteration get the smallest DAYS_DIFF for that ID + first[i] <- min(baseline[which(baseline$PT_ID==PT_ID),abs(baseline$DAYS_DIFF)]) + + #up the iteration counter + i = i + 1 + + } + #make a data frame with the lowest DAYS_DIFF and ID + newdata <- data.frame(uniqueID,first) + names(newdata) <- c("PT_ID","DAYS_DIFF") + + #return the data frame containing the lowest GPI for each ID + return(newdata) + } > ldl.b<-baseline(ldl) #get all baseline ldl patient ID, total 11368 obs, all > unique# Error in `[.data.frame`(baseline, which(baseline$PT_ID == PT_ID), abs(baseline$DAYS_DIFF)) : undefined columns selected Can anyone help me in figuring out how to get the minimum value of the absolute value of DAYS_DIFF for unique ID? Thanks a lot [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.