Thank you for your prompt assistance, cruz and Bart. Bart set me on the right track, and I modified his proposal to this:
f <- function(data){ m <- match(data$stop,data$start) n <- min(length(m),which(is.na(m))) data$stop[n] } by(data,data$id,f) It also handles some special cases outside my small example dataset. Thank you again! Peter. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of bartjoosen Sent: 6. november 2008 11:31 To: r-help@r-project.org Subject: Re: [R] Data manipulation question How about: id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) data <- data.frame(id,start,stop) f <- function(data){ m <- match(data$start,data$stop) + 1 if (length(m)==1 && is.na(m)) m <- 1 if (length(m) > 1 && is.na(m[2])) m <- 1 data$stop[min(m,na.rm=T)] } by(data,data$id,f) The if statements in the function are for some special cases, in all the other cases the firs line will do the trick. I would like to add that using data is a somewhat bad behavior, as this overwrites the build in data function of R. And I changed the way you made up the data.frame, as your method would convert everything to factors. Good luck Bart Peter Jepsen wrote: > > Dear R-listers, > > I am a relatively inexperienced R-user currently migrating from Stata. I > am deeply frustrated by this data manipulation question: I know how I > could do it in Stata, but I cannot make it work in R. > > I have a data frame of hospitalization data where each row represents an > admission. I need to know when patients were first discharged, but the > problem is that patients were sometimes transferred between hospital > departments. In my data a transfer looks like a new admission, except > that it has a 'start' date equal to the previous admission's 'stop' > date. > > Here is an example: > > id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) > start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) > stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) > data <- as.data.frame(cbind(id,start,stop)) > data > # id start stop > # 1 a 0 6 > # 2 a 6 12 > # 3 a 17 20 > # 4 a 20 30 > # 5 b 0 1 > # 6 b 1 10 > # 7 c 0 3 > # 8 c 5 10 > # 9 c 10 11 > # 10 c 11 30 > # 11 c 50 55 > # 12 d 0 6 > > So, what I want to end up with is this: > > id start stop > a 0 12 # This patient was transferred at time 6 and discharged at > time 12. The admission starting at time 17 is therefore irrelevant. > b 0 10 > c 0 3 > d 0 6 > > I have tried tons of variations over lapply, sapply, split, for etc., > all to no avail. > > Thank you in advance for any assistance. > > Best regards, > Peter Jepsen, MD. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.htm l Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.