[R] Aggregating multiple columns

Adam D. I. Kramer Thu, 19 Mar 2009 14:43:58 -0700

Dear colleagues,

        Consider the following data frame:


x <- data.frame(y=rnorm(100),order=rep(1:10,10),subject=rep(1:10,each=10))

        ...it is my goal to aggregate x to compute a linear effect of order
for each subject. So, ideally, result would be a vector containing a single
number for each subject, representing the linear relationship between y and
order.

        I first tried this:

result <- aggregate(x[1:2,],list(subject=x$subject),
            function (z) { lm(y ~ order, data=z)$coefficients[2] }
          )

...because lm(y ~ order, data=x, subset=x$subject==1)$coefficients[2] would
give me the correct term for subject 1 (i.e., that is the number I am
actually looking for).

        However, when used on data frames, aggregate() aggregates every
COLUMN in x _separately_ using FUN...while lm needs both columns *together.*

        ...I then turned to tapply, but that is useful only on "atomic
objects," and not data frames.

        I have two solutions, which I find inelegant and slow:

1) result <- sapply(levels(factor(x$subject)),
               function(z) { lm(y ~ order, data=x, 
subset=subject==z)$coefficients[2]}
             )

...this gets the job done, but is very slow.

2) result <- c();
for (z in 1:nlevels(x$s2)) { result[z] <- lm(y ~ order, data=x,
subset=x$s2==levels(x$s2)[z])$coefficients[2] };
result <- unlist(result);

...also does the job, but is also very slow.

Is there a better solution? I miss the speed of tapply and aggregate; the
example has only 100 rows and 10 subjects, but the actual data has many more
of each.

Cordially,
Adam D. I. Kramer
Ph.D. Candidate, Social and Personality Psychology
University of Oregon

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Aggregating multiple columns

Reply via email to