That's excellent. I somehow missed that variable names were case sensitive. I was using all lowercase names and it was throwing things off. The error I would get using fun instead of FUN was along the lines of:
function (x) x - mean(x) <environment: 0x84be554> Error in unique.default(x) : unique() applies only to vectors Execution halted Which was a bit misleading since it looked like the function was being used. This solution was exactly the sort of thing I was hoping was possible though. Thank you. I talked to my prof about the fact I like R and will never own a copy of SAS, and I get to do my work in R this semester, so y'all may well hear from me again. =) Will On Jan 31, 2008 12:52 PM, Greg Snow <[EMAIL PROTECTED]> wrote: > Here are a couple of ways: > > tmp.df <- data.frame( group=factor(sample(LETTERS[1:3], 25, > replace=TRUE)), > val = rnorm(25, 50, 5) ) > > tmp.df <- transform(tmp.df, ssq = ave(val, group, > FUN=function(x) x-mean(x))^2 ) > # another way > tmp.means <- tapply(tmp.df$val, tmp.df$group, FUN=mean) > tmp.df$ssq2 <- as.vector( ( tmp.df$val - tmp.means[tmp.df$group] ) ^2 ) > # check that they both worked > with( tmp.df, all.equal(ssq,ssq2) ) > > Hope this helps, > > ------------------------------ > *From:* [EMAIL PROTECTED] on behalf of Will Holcomb > *Sent:* Thu 1/31/2008 8:10 AM > *To:* Simon Blomberg > *Cc:* r-help > *Subject:* Re: [R] difficulties computing a simple anova > > Excellent. I was able to do the analysis with no problem. I just had to > do > some twiddling with the inputs to get things from the CSV to the format > you > described. > > When the data is in that format, I would like to be able to say, "create a > column which is each square of the difference of each response with the > mean > of all the responses for that drug." Is there a simple way to do that? > What > I am doing currently is: > > squares = c() > for(drug in unique(drugs)) { > responses = subset(spiderdata, Drug == drug, select = Response) > squares = append(squares, (responses - mean(responses)) ^ 2) > } > spiderdata <- cbind(spiderdata, SquaresWithin = squares) > > This works because the elements are sorted by drug and the drugs vector > has > the drug names in the same order. If my data were mixed up however, this > method would misalign results. Is there any way to say something like: > > spiderdata$SquaresWithin = (spiderdata$Response - mean(subset(spiderdata, > Drug == *spiderdata$Drug of current row*, select = Response))) ^ 2 > > Just using Drug == spiderdata$Drug doesn't seem to work and I think it is > because Drug is a factor? > > I suppose I could always sort on the element I was generating a new column > for and cause them to align, but I was wondering if the syntax would allow > me to refer to elements of the row a computation. I was trying it with > tapply which it sounds like should work, but this is not the method: > > sq_within = function(row) { (row$Response - mean(subset(spiderdata, row = > row$Drug))) ^ 2 } > tapply(spiderdata, spiderdata$Drug, fun = sq_within) > > It doesn't like the first and second arguments to be different lengths. > The > first argument is the data to be iterated over though and I think my > understanding of how the function works is somehow fundamentally off. I've > had a hard time finding clear examples where tapply isn't used with a > function that generates a different number of outputs than there are > inputs. > > Thank you again for the help. I'm liking R, just still getting the hang of > some of the conceptual stuff. > > Will > > P.S. You don't have to be vague for the sake of my homework. The > assignment > was to do this in SAS which I've already completed. I don't plan on owning > SAS though and I would like to be able to do statistics in the future, so > learning R is just for my benefit. > > On Jan 30, 2008 8:11 PM, Simon Blomberg <[EMAIL PROTECTED]> wrote: > > > Your data setup is wrong. You have one factor (Drug) with 3 levels > > (Zoloft, Naltrexone, Valium). So your data should be: > > > > spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"), > > each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12, > > 18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9)) > > > > You should be able to take it from there. (Since this is a homework > > problem, I'm being intentionally vague.) > > > > cheers, > > > > Simon. > > > > On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote: > > > My grasp of R and statistics are both seriously lacking, so if this > > question > > > is completely naive, I apologize in advance. I've hunted for a couple > > hours > > > on the internet and none of the methods I've found have produced the > > result > > > I'm looking for. > > > > > > I'm currently a student in a Statistics class and we are learning the > > ANOVA. > > > We had to do one by hand and then reproduce our work in SAS. I really > > like > > > the idea of understanding R however and would like to reproduce the > > solution > > > in R if possible. > > > > > > Where I'm at now is this little program: > > > http://odin.himinbi.org/classes/psy304b/spider_analysis.r > > > > > > The program calculates an anova manually (correctly, I'm pretty sure, > it > > > agrees with the same numbers in excel). The answer that it comes up > with > > > doesn't agree with any of the numbers I can get either the aov or > anova > > > functions to produce. > > > > > > Can anyone help me with simply the method to compute a one-way anova? > > Well, > > > specifically how to replicate the sort of anova people learn in an > intro > > to > > > statistics class. All of the degrees of freedom are off from what I > > expect > > > them to be (they're all 1). > > > > > > (The original problem, should it help in understanding my question, is > > at: > > > http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it > > will > > > likely look pretty funky if your browser doesn't support mathml > (firefox > > > does).) > > > > > > Will > > > > > > The program is as follows: > > > > > > library(foreign) > > > # spiderdata <- read.csv("spider_data.csv") > > > > > > spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, > 6), > > > Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17), > > > Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9)) > > > > > > summary(spiderdata) > > > > > > # Compute a one-way ANOVA by hand > > > > > > J = length(spiderdata) > > > > > > sqdata <- data.frame((spiderdata[1] - mean(spiderdata[1])) ^ 2) > > > for(j in 2:J) { > > > sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2) > > > } > > > sqdata > > > > > > N = 0 > > > for(j in 1:J) { > > > N = N + length(sqdata[[j]]) > > > } > > > > > > SSW = sum(sqdata) > > > MSW = SSW / (N - J) > > > SSB = 0 > > > for(j in 1:(length(spiderdata))) { > > > SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[[1]] - > > > (sum(spiderdata) / N)) ^ 2) > > > } > > > MSB = SSB / (J - 1) > > > > > > F = MSB / MSW > > > f_prob = pf(F, J - 1, N - J) > > > reject_point = qf(.95, J - 1, N - J) > > > > > > cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", > F, > > ", > > > P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "") > > > > > > anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata)) > > > aov(Zoloft ~ Valium + Naltrexone, data = spiderdata) > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > -- > > Simon Blomberg, BSc (Hons), PhD, MAppStat. > > Lecturer and Consultant Statistician > > Faculty of Biological and Chemical Sciences > > The University of Queensland > > St. Lucia Queensland 4072 > > Australia > > Room 320 Goddard Building (8) > > T: +61 7 3365 2506 > > http://www.uq.edu.au/~uqsblomb <http://www.uq.edu.au/%7Euqsblomb> < > http://www.uq.edu.au/%7Euqsblomb> > > email: S.Blomberg1_at_uq.edu.au > > > > Policies: > > 1. I will NOT analyse your data for you. > > 2. Your deadline is your problem. > > > > The combination of some data and an aching desire for > > an answer does not ensure that a reasonable answer can > > be extracted from a given body of data. - John Tukey. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.