On 17/09/2015 7:06 AM, John Sorkin wrote: > I have a long (rather than wide file), i.e. the data for each subject is on > multiple lines rather than one line. Each line has the following layout: > subject group time value > I have two groups, multiple subjects, each subject can be seen up to three > times a time 0, and at most once at times 4 and 8. > An example of the data follows: > > 1 control 0 100 > 1 control 0 NA > 1 control 0 55 > 1 control 4 100 > 1 control 8 100 > > 2 exp 0 99 > 2 exp 0 67 > 2 exp 0 66 > 2 exp 4 110 > 2 exp 8 200 > > I need to get means by group (control vs. exp) within time (0,4,8). The means > should include only those subjects who have at least one observation at each > time point (0, 4, 8). I also need to determine the number of subjects who > contribute data at each time-point by group. Any suggestion on how to get > them means would be appreciated. Sad to say I worked on this for four hours > last night without coming to any understanding how this can be done. UGG!
Do it in two stages. First, group the data by subject id, and delete any subjects that don't have sufficient observations. Then group by treatment and time and take means. The tapply() or by() functions will be useful for both of these steps. For example, do.call(rbind, by(x, x$subjectid, function(sub) if (length(unique(sub$times)) == 3) sub else NULL)) will remove subjects with other than 3 observed times. (It doesn't take NA into account; if you need to do that, you'll need to make that function(sub) more complicated. "sub" will be a dataframe containing data for just one subject.) The "do.call(rbind" puts the list output from by() back together as a single dataframe. Duncan Murdoch ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.