I have a population of subjects each with a variable which has been captured at 
a baseline date.  Then for many subjects (but not all) an intervention has 
occurred and the variable has changed at one or more time points after the 
baseline date.  So my dataset consists of a subject ID (x), which may appear 
several times or just once, a measure (y), and a date of observation (z).  I 
would like to be able to have some sort of animated plot with a slider 
representing time so I can show how the distribution of the variable has 
altered (say in a histogram or a box plot) from baseline up to the end of a 
period. I need each subject only to be counted once in this distribution using 
the measure recorded up to or including the current data on the slider.

I have created a synthetic data set using the code below which kind of 
replicates the problem for a just a few data points over a couple of months.  
My real data set has about 30,000 subjects with multiple measures captured over 
10 years. What I need for each date point is a summary chart such as a 
histogram,  which shows me the distribution of my variable (in this case y) 
with just one observation per subject, that observation being the most up to 
date at the point at which the slider.

I have tried to use the manipulate package, which I've used successfully for 
other simple applications, but hit two problems - firstly it doesn't like dates 
as a slider variable. I can work around this by making them numeric, but would 
like to work with dates if possible.  Secondly I don't know how to restrict 
observations to the date on the slider - eg. Subject 100 has a baseline of 
45.26, but on or after 26th April it becomes 56.96. So where the slider is set 
beyond this date I would want the earlier value for this subject to be excluded 
from the distribution.   I'm not sure that manipulate was really made for this 
problem and perhaps I should be looking elsewhere.

I guess a solution involves using the aggregate command to get the unique 
values at various time points and then using something like the TeachingDemos 
package?   Not sure I'm on the right path with this though and as an R beginner 
I can't get to first base with this.  I can't figure how to use aggregate to 
give me the value of y corresponding to the latest date z.  I know this is 
basic stuff but I really can't see how to do it.

My synthetic data can be generated like this (clunky I know, but I can't do 
this any slicker)

#make my baseline observations for 100 subjects on 1st April 2013
set.seed(1)
a<-data.frame(x=seq(1:100),y=rnorm(100,mean=50,sd=10),z=as.Date("2013-04-01"))
#simulate 50 subsequent observations in the next 2 months resulting in some 
subjects having different future measurements of y
Start <- as.Date("2013-04-02")
End <- as.Date("2013-06-30")
dates <- seq(from = Start, to = End, by = 1)
set.seed(1)
b<-data.frame(x=sample(0:100,50,replace=TRUE),y=rnorm(50,mean=50,sd=10),z=sample(dates,50,replace=FALSE))
#make one table of observations
c<-merge(a,b,all=TRUE)

Any suggestions much appreciated.

Gavin.



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to