Keith, we seem to agree, more or less.
I take a simple approach to model fitting: absent any
compelling reason for a particular form (here: cubic),
I consider that form to be speculative, sometimes
reasonably well supported by the data, sometimes not.
For the OP's data, the evidence for a cubic model is
all but nonexistent. Based on those data, I would never
recommend a cubic model to a client. As to how one
chooses to assess the (in)adequacy of the model, I'm
well aware of the various more formal techniques, but
they're often not needed.

(The use of poly() in this example saves typing.)

 -Peter Ehlers


kMan wrote:
Dear Peter,

Ah, I see your point, Professor. The point at x=23.5 is carrying the model.
Allow me to clarify. I was making a similar point.

I was suggesting that the cube term could be left in the model, as you did,
but rather than dropping the data-point, another model is fit with an
additional parameter added to control for the suspected outlier -
essentially a point discontinuity.

Note that the call to poly() is not necessary for the graphical
representation, so I'll continue the example without it.
fm3<-lm(y~x+I(x^2)+I(x^3), data=dat)
fm3.c<-coef(fm3)

# drop data point alternative subseting using logical indexes.
index<-dat$x!=23.5
x2<-x[index]
x2<-dat$x[index]
y2<-dat$y[index]
fm4<-lm(y2~x2+I(x2^2)+I(x2^3), data=dat)

# controlling for the potential outlier in the model, without throwing out
the data. ctrl<-rep(0,length=dim(dat)[1])
ctrl[dat$x==23.5]<-resid(fm3)[[dat$x==23.5]
fm5<-lm(y~x+I(x^2)+I(x^3)+ctrl, data=dat)

# avoids the predict & lines calls, but requires knowledge of the model. curve(fm3.c[1]+fm3.c[2]*x+fm3.c[3]*x^2+fm3.c[4]*x^3, col="green", add=TRUE)
curve(fm4.c[1]+fm4.c[2]*x+fm4.c[3]*x^2+fm4.c[4]*x^3, col="green", add=TRUE)
# plot dropped outlier
curve(fm5.c[1]+fm5.c[2]*x+fm5.c[3]*x^2+fm5.c[4]*x^3, col="purple", add=TRUE)
# plot without ctrl variable

anova(fm5)
Tells us how much difference one point made, sacrificing a df just to ensure
we're not throwing out information haphazardly. F>1 or whatever cutoff you
want to use. If it turns out that the point is important, then at least its
existence & effect was documented.
There is some irony, I suppose, that the graphical representation of
dropping the point vs. adding a control variable, is equivalent for this
example. I have not decided how I feel about that yet, but I do have a
splitting headache.
Sincerely,
KeithC.

-----Original Message-----
From: Peter Ehlers [mailto:ehl...@ucalgary.ca] Sent: Sunday, February 14, 2010 10:04 PM
To: kMan
Cc: r-help@r-project.org
Subject: Re: [R] Plot different regression models on one graph

kMan wrote:
Peter wrote:
You like to live dangerously.
Clue me in, Professor.

Sincerely,
KeithC.


Okay, Keith, here goes:

dat <-
structure(list(x = c(62.5, 68.5, 0, 52, 0, 52, 0, 52, 23.5, 86, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), y = c(0.054, 0.055, 0.017, 0.021, 0.02, 0.028, 0.032,
0.073, 0.076, 0.087, 0.042, 0.042, 0.041, 0.045, 0.021, 0.018, 0.017, 0.018,
0.028, 0.022)), .Names = c("x", "y"), row.names = c(NA, -20L), class =
"data.frame")

fm1 <- lm(y ~ poly(x,3), data = dat)
fm2 <- lm(y ~ poly(x,3), data = dat, subset = -9)

xx <- 0:86
yhat1 <- predict(fm1, list(x = xx))
yhat2 <- predict(fm2, list(x = xx))

plot(x,y)
lines(xx, yhat1, col="blue", lwd=2)
lines(xx, yhat2, col="red", lwd=2)


That's how much difference *one* point makes in a cubic fit!
I'm not much of a gambler, so I wouldn't base any important decisions on the
results of the fit.

Cheers,

  -Peter Ehlers

-----Original Message-----
From: Peter Ehlers [mailto:ehl...@ucalgary.ca]
Sent: Sunday, February 14, 2010 6:49 PM
To: kMan
Cc: 'David Winsemius'; 'Rhonda Reidy'; r-help@r-project.org
Subject: Re: [R] Plot different regression models on one graph

kMan wrote:
I would use all of the data. If you want to "drop" one, control for it in the model & sacrifice a degree of freedom.
You like to live dangerously.

  -Peter Ehlers

Why the call to poly() by the way?

KeithC.

-----Original Message-----
From: Peter Ehlers [mailto:ehl...@ucalgary.ca]
Sent: Saturday, February 13, 2010 1:35 PM
To: David Winsemius
Cc: Rhonda Reidy; r-help@r-project.org
Subject: Re: [R] Plot different regression models on one graph

Rhonda:

As David points out, a cubic fit is rather speculative. I think that one needs to distinguish two situations: 1) theoretical justification for a cubic model is available, or 2) you're dredging the data for the
"best"
fit.
Your case is the second. That puts you in the realm of EDA (exploratory
data
analysis). You're free to fit any model you wish, but you should assess
the
value of the model. One convenient way is to use the influence.measures() function, which will show that points 9 and 10 in your data have a strong influence on your cubic fit (as, of course, your eyes would tell you). A good thing to do at this point is to
fit 3 more cubic models:
1) omitting point 9, 2) omitting point 10, 3) omitting both.

Try it and plot the resulting fits. You may be surprised.

Conclusion: Without more data, you can conclude nothing about a non-straightline fit.

(And this hasn't even addressed the relative abundance of x=0 data.)

  -Peter Ehlers

David Winsemius wrote:
On Feb 13, 2010, at 1:35 PM, Rhonda Reidy wrote:

The following variables have the following significant relationships (x is the explanatory variable): linear, cubic,
exponential, logistic.
The linear relationship plots without any trouble.

Cubic is the 'best' model, but it is not plotting as a smooth curve using the following code:

cubic.lm<- lm(y~poly(x,3))
Try:

lines(0:80, predict(cubic.lm, data.frame(x=0:80)),lwd=2)

But I really must say the superiority of that relationship over a linear one is far from convincing to my eyes. Seems to be caused by one data point. I hope the quotes around "best" mean tha tyou have the
same qualms.
lines(x,predict(cubic.lm),lwd=2)

How do I plot the data and the estimated curves for all of these regression models in the same plot?

x <- c(62.5,68.5,0,52,0,52,0,52,23.5,86,0,0,0,0,0,0,0,0,0,0)

y <-
c(0.054,0.055,0.017,0.021,0.020,0.028,0.032,0.073,0.076,0.087,0.042
,0
.042,0.041,0.045,0.021,0.018,0.017,0.018,0.028,0.022)


Thanks in advance.

Rhonda Reidy

--
Peter Ehlers
University of Calgary







--
Peter Ehlers
University of Calgary

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to