I am trying to set up a data set for a survival analysis with time-varying covariates. The data is already in a long format, but does not have a variable to signify the stopping point for the interval. The variable DaysEnrolled is the variable I would like to use to form this interval. This is what I have now:
ID Age DaysEnrolled HAZ WAZ WHZ Food onARV HIVStatus LTFUp 1 71622 0.008 0 NA NA NA NA 0 HIV exposed, status indeterminate 0 2 71622 0.085 28 NA NA NA NA 0 HIV exposed, status indeterminate 0 3 71622 0.123 42 NA NA NA NA 0 HIV exposed, status indeterminate 0 4 71622 0.277 98 NA NA NA NA 0 HIV exposed, status indeterminate 0 5 71622 0.441 158 NA NA NA NA 0 HIV exposed, status indeterminate 0 6 71622 0.517 186 NA NA NA NA 0 HIV exposed, status indeterminate 0 7 71622 0.594 214 NA NA NA NA 0 HIV exposed, status indeterminate 0 8 71622 0.715 258 NA NA NA NA 0 HIV exposed, status indeterminate 0 9 71622 0.791 286 NA NA NA NA 0 HIV exposed, status indeterminate 0 This is what I would like to have: ID Age DaysEnrolled HAZ WAZ WHZ Food onARV HIVStatus LTFUp Start Stop 1 71622 0.008 0 NA NA NA NA 0 HIV exposed, status indeterminate 0 0 28 2 71622 0.085 28 NA NA NA NA 0 HIV exposed, status indeterminate 0 28 42 3 71622 0.123 42 NA NA NA NA 0 HIV exposed, status indeterminate 0 42 98 4 71622 0.277 98 NA NA NA NA 0 HIV exposed, status indeterminate 0 98 158 5 71622 0.441 158 NA NA NA NA 0 HIV exposed, status indeterminate 0 158 186 6 71622 0.517 186 NA NA NA NA 0 HIV exposed, status indeterminate 0 186 214 7 71622 0.594 214 NA NA NA NA 0 HIV exposed, status indeterminate 0 214 258 8 71622 0.715 258 NA NA NA NA 0 HIV exposed, status indeterminate 0 258 286 9 71622 0.791 286 NA NA NA NA 0 HIV exposed, status indeterminate 0 286 NA I am not sure how to put this in a function. I thought of using embed() in tapply(). astop <- tapply(sample1$DaysEnrolled, sample1$ID, function(x){ ifelse(length(x) == 1, embed(x,1), ifelse(length(x) > 1, embed(x,2), NA))}) This doesn't do what I thought it would. I know that I could write a double loop to look at each subject and the differing number of observations for each subject, but would like to avoid that it at all possible. Sample of 2 subjects: sample1 <- structure(list(ID = c(71622L, 71622L, 71622L, 71622L, 71622L, 71622L, 71622L, 71622L, 71622L, 1436L), Age = c(0.008, 0.085, 0.123, 0.277, 0.441, 0.517, 0.594, 0.715, 0.791, 6.968), DaysEnrolled = c(0L, 28L, 42L, 98L, 158L, 186L, 214L, 258L, 286L, 0L), HAZ = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), WAZ = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), WHZ = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Food = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_), onARV = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), HIVStatus = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "HIV exposed, status indeterminate", "HIV infected", "HIV negative"), class = "factor"), LTFUp = c(0, 0, 0, 0, 0, 0, 0, 0, 0, NA), Start = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Stop = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("ID", "Age", "DaysEnrolled", "HAZ", "WAZ", "WHZ", "Food", "onARV", "HIVStatus", "LTFUp", "Start", "Stop"), row.names = c(NA, 10L ), class = "data.frame") Adrian Katschke Biostatistician IU Department of Medicine Division of Biostatistics akats...@iupui.edu 317-278-6665 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.