To access elements of a list (object returned by split), you need to use "[[".
Therefore, summary(temp[[1]]) is what you meant to use (or even summ = lapply(temp, summary) - which will give you the summaries for every subject). About using PDFs, I'd recommend you to take a look at Sweave ( http://www.statistik.lmu.de/~leisch/Sweave/ ) b On Mon, Mar 22, 2010 at 1:27 PM, Clay Heaton <cchea...@gmail.com> wrote: > Hi, very new to R here... > > I have a data frame called 'set' with 100k+ rows in it that looks like this: > > subject timestamp yvalue traceabs subjtrace > 1 1 1992-07-12 06:05:00 12 1 1-1 > 2 1 1992-07-12 06:10:00 15 1 1-1 > 3 1 1992-07-12 06:15:00 17 1 1-1 > 4 1 1992-07-12 06:20:00 20 1 1-1 > 5 1 1992-07-12 06:25:00 24 1 1-1 > .... > > There are 89 subjects, each of which have a different number of traces -- > it's time series data. There are, in total, around 180 traces. The > "subjtrace" variable is just a concatenation of the subject number, a hyphen, > and the relative trace number. For instance, the first trace for subject 46 > is "46-1" but the traceabs value for the same trace is 71. > > I need to perform simple statistics on each subject and on each trace. I also > need to graph each trace. > > It seems like the easy approach to identifying the variables would be to use > the split() function to create groups: > >> temp <- split(set, set$subject) > > When I then try, for example: > >> summary(temp[1]) > > all I get as a result is: > Length Class Mode > 1 5 data.frame list > > So I went with: > >> lapply(temp[1], summary) > > That works, but I'm unable to do something like: > >> lapply(temp[1]$yvalue, mean) > > because the result returned is: > list() > > Ultimately, I'm trying to run the exact same code on each group, as defined > by the subject number, and each trace. I would like to display something like > the following: > > Subject # and Summary Statistics > -- Graph of a trace belonging to the subject > -- Summary statistics for the trace > -- Graph of the next trace belonging to the subject > -- Summary statistics for the trace > -- etc... > > My intention is to dump this all into a .pdf file with Sweave and LaTeX. > > Questions: > - Is split() the best function to use to create the proper groups? or should > I look to create a separate variable for each group using subset, like: > temp.46 <- subset(set, subject==46,select=c(subject, timestamp, yvalue, > subjtrace)) > > - How do I call functions on data within the groups created by split()? > Like... > lapply(temp[1]$yvalue, sd) > > - In an effort to try to learn the proper way to approach this, what would be > the best practice for iterating through the data and pushing it to .pdf? > > Thanks! > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.