Wacek Kusnierczyk wrote:
gallon li wrote:
Suppose I have a long format for a longitudinal data

id time x
1 1 10
1 2 11
1 3 23
1 4 23
2 2 12
2 3 13
2 4 14
3 1 11
3 3 15
3 4 18
3 5 21
4 2 22
4 3 27
4 6 29

I want to select the x values for each ID when time is equal to 3. When that
observation is not observed, then I want to replace it with the obervation
at time equal to 4. otherwise just use NA.

with this dummy data:

    data = read.table(header=TRUE, textConnection(open='r', '
id time x 2 2 2
        2 3 3
        2 4 4
        2 5 5
        3 3 3
        3 4 4
        3 5 5
        4 4 4
        4 5 5
        5 5 5'))

you seem to expect the result to be like

    # id time x
    # 2 3 3
    # 3 3 3
    # 4 4 4
    # 5 NA NA

one way to hack this is:

    # the time points you'd like to use, in order of preference
    times = 3:4

    # split the data by id,
    # for each subset, find values of x for the first time found, or use NA
    # combine the subsets back into a single data frame
    do.call(rbind, by(data, data$id, function(data)
        with(data, {
            rows = (time == times[which(times %in% time)[1]])
            if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
data[rows,] })))
    #   id time  x
    # 2  2    3  3
    # 3  3    3  3
    # 4  4    4  4
    # 5  5   NA NA

with your original data:

    data = read.table(header=TRUE, textConnection(open='r', '
       id time x
       1 1 10
       1 2 11
       1 3 23
       1 4 23
       2 2 12
       2 3 13
       2 4 14
       3 1 11
       3 3 15
       3 4 18
       3 5 21
       4 2 22
       4 3 27
       4 6 29'))
    times = 3:4
    do.call(rbind, by(data, data$id, function(data)
        with(data, {
            rows = (time == times[which(times %in% time)[1]])
            if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
data[rows,] })))

    #   id time  x
    # 1  1    3 23
    # 2  2    3 13
    # 3  3    3 15
    # 4  4    3 27

is this what you wanted?

There's also the straightforward answer:

> sapply(split(data,data$id), function(d) { r <- d$x[d$time==3]
+    if(!length(r)) r <- d$x[d$time==4]
+    if(!length(r)) NA
+    r})
 1  2  3  4
23 13 15 27

or, just to checkout the case where time==3 is actually missing:

> sapply(split(data[-c(6,13),],data$id[-c(6,13)]), function(d) {
+    r <- d$x[d$time==3]
+    if(!length(r)) r <- d$x[d$time==4]
+    if(!length(r)) r <- NA
+    r})
 1  2  3  4
23 14 15 NA


--
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to