Hi, very new to R here...

I have a data frame called 'set' with 100k+ rows in it that looks like this:

  subject           timestamp  yvalue traceabs subjtrace
1       1 1992-07-12 06:05:00      12        1       1-1
2       1 1992-07-12 06:10:00      15        1       1-1
3       1 1992-07-12 06:15:00      17        1       1-1
4       1 1992-07-12 06:20:00      20        1       1-1
5       1 1992-07-12 06:25:00      24        1       1-1
....

There are 89 subjects, each of which have a different number of traces -- it's 
time series data. There are, in total, around 180 traces. The "subjtrace" 
variable is just a concatenation of the subject number, a hyphen, and the 
relative trace number. For instance, the first trace for subject 46 is "46-1" 
but the traceabs value for the same trace is 71.

I need to perform simple statistics on each subject and on each trace. I also 
need to graph each trace.

It seems like the easy approach to identifying the variables would be to use 
the split() function to create groups:

> temp <- split(set, set$subject)

When I then try, for example:

> summary(temp[1])

all I get as a result is:
  Length Class      Mode
1 5      data.frame list

So I went with:

> lapply(temp[1], summary)

That works, but I'm unable to do something like:

> lapply(temp[1]$yvalue, mean)

because the result returned is:
list()

Ultimately, I'm trying to run the exact same code on each group, as defined by 
the subject number, and each trace. I would like to display something like the 
following:

Subject # and Summary Statistics
-- Graph of a trace belonging to the subject
-- Summary statistics for the trace
-- Graph of the next trace belonging to the subject
-- Summary statistics for the trace
-- etc...

My intention is to dump this all into a .pdf file with Sweave and LaTeX.

Questions:
- Is split() the best function to use to create the proper groups? or should I 
look to create a separate variable for each group using subset, like:
temp.46 <- subset(set, subject==46,select=c(subject, timestamp, yvalue, 
subjtrace))

- How do I call functions on data within the groups created by split()? Like...
lapply(temp[1]$yvalue, sd)

- In an effort to try to learn the proper way to approach this, what would be 
the best practice for iterating through the data and pushing it to .pdf?

Thanks! 
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to