Dear colleagues, Consider the following data frame:
x <- data.frame(y=rnorm(100),order=rep(1:10,10),subject=rep(1:10,each=10)) ...it is my goal to aggregate x to compute a linear effect of order for each subject. So, ideally, result would be a vector containing a single number for each subject, representing the linear relationship between y and order. I first tried this: result <- aggregate(x[1:2,],list(subject=x$subject), function (z) { lm(y ~ order, data=z)$coefficients[2] } ) ...because lm(y ~ order, data=x, subset=x$subject==1)$coefficients[2] would give me the correct term for subject 1 (i.e., that is the number I am actually looking for). However, when used on data frames, aggregate() aggregates every COLUMN in x _separately_ using FUN...while lm needs both columns *together.* ...I then turned to tapply, but that is useful only on "atomic objects," and not data frames. I have two solutions, which I find inelegant and slow: 1) result <- sapply(levels(factor(x$subject)), function(z) { lm(y ~ order, data=x, subset=subject==z)$coefficients[2]} ) ...this gets the job done, but is very slow. 2) result <- c(); for (z in 1:nlevels(x$s2)) { result[z] <- lm(y ~ order, data=x, subset=x$s2==levels(x$s2)[z])$coefficients[2] }; result <- unlist(result); ...also does the job, but is also very slow. Is there a better solution? I miss the speed of tapply and aggregate; the example has only 100 rows and 10 subjects, but the actual data has many more of each. Cordially, Adam D. I. Kramer Ph.D. Candidate, Social and Personality Psychology University of Oregon ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.