Wacek Kusnierczyk wrote:
gallon li wrote:
Suppose I have a long format for a longitudinal data
id time x
1 1 10
1 2 11
1 3 23
1 4 23
2 2 12
2 3 13
2 4 14
3 1 11
3 3 15
3 4 18
3 5 21
4 2 22
4 3 27
4 6 29
I want to select the x values for each ID when time is equal to 3. When that
observation is not observed, then I want to replace it with the obervation
at time equal to 4. otherwise just use NA.
with this dummy data:
data = read.table(header=TRUE, textConnection(open='r', '
id time x
2 2 2
2 3 3
2 4 4
2 5 5
3 3 3
3 4 4
3 5 5
4 4 4
4 5 5
5 5 5'))
you seem to expect the result to be like
# id time x
# 2 3 3
# 3 3 3
# 4 4 4
# 5 NA NA
one way to hack this is:
# the time points you'd like to use, in order of preference
times = 3:4
# split the data by id,
# for each subset, find values of x for the first time found, or use NA
# combine the subsets back into a single data frame
do.call(rbind, by(data, data$id, function(data)
with(data, {
rows = (time == times[which(times %in% time)[1]])
if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
data[rows,] })))
# id time x
# 2 2 3 3
# 3 3 3 3
# 4 4 4 4
# 5 5 NA NA
with your original data:
data = read.table(header=TRUE, textConnection(open='r', '
id time x
1 1 10
1 2 11
1 3 23
1 4 23
2 2 12
2 3 13
2 4 14
3 1 11
3 3 15
3 4 18
3 5 21
4 2 22
4 3 27
4 6 29'))
times = 3:4
do.call(rbind, by(data, data$id, function(data)
with(data, {
rows = (time == times[which(times %in% time)[1]])
if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
data[rows,] })))
# id time x
# 1 1 3 23
# 2 2 3 13
# 3 3 3 15
# 4 4 3 27
is this what you wanted?
There's also the straightforward answer:
> sapply(split(data,data$id), function(d) { r <- d$x[d$time==3]
+ if(!length(r)) r <- d$x[d$time==4]
+ if(!length(r)) NA
+ r})
1 2 3 4
23 13 15 27
or, just to checkout the case where time==3 is actually missing:
> sapply(split(data[-c(6,13),],data$id[-c(6,13)]), function(d) {
+ r <- d$x[d$time==3]
+ if(!length(r)) r <- d$x[d$time==4]
+ if(!length(r)) r <- NA
+ r})
1 2 3 4
23 14 15 NA
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.