Re: [R] difficulties computing a simple anova

Will Holcomb Thu, 31 Jan 2008 07:22:07 -0800

Excellent. I was able to do the analysis with no problem. I just had to do
some twiddling with the inputs to get things from the CSV to the format you
described.


When the data is in that format, I would like to be able to say, "create a
column which is each square of the difference of each response with the mean
of all the responses for that drug." Is there a simple way to do that? What
I am doing currently is:

squares = c()
for(drug in unique(drugs)) {
   responses = subset(spiderdata, Drug == drug, select = Response)
   squares = append(squares, (responses - mean(responses)) ^ 2)
}
spiderdata <- cbind(spiderdata, SquaresWithin = squares)

This works because the elements are sorted by drug and the drugs vector has
the drug names in the same order. If my data were mixed up however, this
method would misalign results. Is there any way to say something like:

spiderdata$SquaresWithin = (spiderdata$Response - mean(subset(spiderdata,
Drug == *spiderdata$Drug of current row*, select = Response))) ^ 2

Just using Drug == spiderdata$Drug doesn't seem to work and I think it is
because Drug is a factor?

I suppose I could always sort on the element I was generating a new column
for and cause them to align, but I was wondering if the syntax would allow
me to refer to elements of the row a computation. I was trying it with
tapply which it sounds like should work, but this is not the method:

sq_within = function(row) { (row$Response - mean(subset(spiderdata, row =
row$Drug))) ^ 2 }
tapply(spiderdata, spiderdata$Drug, fun = sq_within)

It doesn't like the first and second arguments to be different lengths. The
first argument is the data to be iterated over though and I think my
understanding of how the function works is somehow fundamentally off. I've
had a hard time finding clear examples where tapply isn't used with a
function that generates a different number of outputs than there are inputs.

Thank you again for the help. I'm liking R, just still getting the hang of
some of the conceptual stuff.

Will

P.S. You don't have to be vague for the sake of my homework. The assignment
was to do this in SAS which I've already completed. I don't plan on owning
SAS though and I would like to be able to do statistics in the future, so
learning R is just for my benefit.

On Jan 30, 2008 8:11 PM, Simon Blomberg <[EMAIL PROTECTED]> wrote:

> Your data setup is wrong. You have one factor (Drug) with 3 levels
> (Zoloft, Naltrexone, Valium). So your data should be:
>
> spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"),
> each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12,
> 18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
>
> You should be able to take it from there. (Since this is a homework
> problem, I'm being intentionally vague.)
>
> cheers,
>
> Simon.
>
> On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote:
> > My grasp of R and statistics are both seriously lacking, so if this
> question
> > is completely naive, I apologize in advance. I've hunted for a couple
> hours
> > on the internet and none of the methods I've found have produced the
> result
> > I'm looking for.
> >
> > I'm currently a student in a Statistics class and we are learning the
> ANOVA.
> > We had to do one by hand and then reproduce our work in SAS. I really
> like
> > the idea of understanding R however and would like to reproduce the
> solution
> > in R if possible.
> >
> > Where I'm at now is this little program:
> > http://odin.himinbi.org/classes/psy304b/spider_analysis.r
> >
> > The program calculates an anova manually (correctly, I'm pretty sure, it
> > agrees with the same numbers in excel). The answer that it comes up with
> > doesn't agree with any of the numbers I can get either the aov or anova
> > functions to produce.
> >
> > Can anyone help me with simply the method to compute a one-way anova?
> Well,
> > specifically how to replicate the sort of anova people learn in an intro
> to
> > statistics class. All of the degrees of freedom are off from what I
> expect
> > them to be (they're all 1).
> >
> > (The original problem, should it help in understanding my question, is
> at:
> > http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it
> will
> > likely look pretty funky if your browser doesn't support mathml (firefox
> > does).)
> >
> > Will
> >
> > The program is as follows:
> >
> > library(foreign)
> > # spiderdata <- read.csv("spider_data.csv")
> >
> > spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6),
> > Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17),
> > Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
> >
> > summary(spiderdata)
> >
> > # Compute a one-way ANOVA by hand
> >
> > J = length(spiderdata)
> >
> > sqdata <- data.frame((spiderdata[1] - mean(spiderdata[1])) ^ 2)
> > for(j in 2:J) {
> > sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2)
> > }
> > sqdata
> >
> > N = 0
> > for(j in 1:J) {
> > N = N + length(sqdata[[j]])
> > }
> >
> > SSW = sum(sqdata)
> > MSW = SSW / (N - J)
> > SSB = 0
> > for(j in 1:(length(spiderdata))) {
> > SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[[1]] -
> > (sum(spiderdata) / N)) ^ 2)
> > }
> > MSB = SSB / (J - 1)
> >
> > F = MSB / MSW
> > f_prob = pf(F, J - 1, N - J)
> > reject_point = qf(.95, J - 1, N - J)
> >
> > cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", F,
> ",
> > P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "")
> >
> > anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata))
> > aov(Zoloft ~ Valium + Naltrexone, data = spiderdata)
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> --
> Simon Blomberg, BSc (Hons), PhD, MAppStat.
> Lecturer and Consultant Statistician
> Faculty of Biological and Chemical Sciences
> The University of Queensland
> St. Lucia Queensland 4072
> Australia
> Room 320 Goddard Building (8)
> T: +61 7 3365 2506
> http://www.uq.edu.au/~uqsblomb <http://www.uq.edu.au/%7Euqsblomb>
> email: S.Blomberg1_at_uq.edu.au
>
> Policies:
> 1.  I will NOT analyse your data for you.
> 2.  Your deadline is your problem.
>
> The combination of some data and an aching desire for
> an answer does not ensure that a reasonable answer can
> be extracted from a given body of data. - John Tukey.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] difficulties computing a simple anova

Reply via email to