Re: [R] getting means by group within time point for data on multiple lines (long rather than wide file)

Duncan Murdoch Thu, 17 Sep 2015 04:38:20 -0700

On 17/09/2015 7:06 AM, John Sorkin wrote:
> I have a long (rather than wide file), i.e. the data for each subject is on 
> multiple lines rather than one line. Each line has the following layout:
> subject group time value
> I have two groups, multiple subjects, each subject can be seen up to three 
> times a time 0, and at most once at times 4 and 8.
> An example of the data follows:
> 
> 1 control 0 100
> 1 control 0 NA
> 1 control 0 55
> 1 control 4 100
> 1 control 8 100
> 
> 2 exp 0 99
> 2 exp 0 67
> 2 exp 0 66
> 2 exp 4 110
> 2 exp 8 200
> 
> I need to get means by group (control vs. exp) within time (0,4,8). The means 
> should include only those subjects who have at least one observation at each 
> time point (0, 4, 8). I also need to determine the number of subjects who 
> contribute data at each time-point by group. Any suggestion on how to get 
> them means would be appreciated. Sad to say I worked on this for four hours 
> last night without coming to any understanding how this can be done. UGG!


Do it in two stages.  First, group the data by subject id, and delete
any subjects that don't have sufficient observations.  Then group by
treatment and time and take means.

The tapply() or by() functions will be useful for both of these steps.
For example,

do.call(rbind,
  by(x, x$subjectid,
     function(sub)
       if (length(unique(sub$times)) == 3) sub
       else NULL))

will remove subjects with other than 3 observed times.  (It doesn't take
NA into account; if you need to do that, you'll need to make that
function(sub) more complicated.  "sub" will be a dataframe containing
data for just one subject.)

The "do.call(rbind" puts the list output from by() back together as a
single dataframe.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] getting means by group within time point for data on multiple lines (long rather than wide file)

Reply via email to