This was AFAICS a bug introduced in 2.7.1 by
o plot(<glm>, which=5) uses more correct Cook's distance contours;
(fix to fix to PR#9316).
Thanks to Greg for the diagnosis.
Will be fixed in R-patched later today.
On Wed, 12 Nov 2008, Greg Snow wrote:
Just a clarification on one of my statements below. I realize on rereading my
statement on R-core paying attention that it could be interpreted as a possible
criticism. That is not how it was intended, rather that I have seen cases in
the past where a discussion confirms that something is a bug and a member of
R-core already has the fix started or in place before a formal bug report could
be sent and so we should not send a bug report on this unless it was clear that
one was needed (which it is not, I received an e-mail off line that Prof.
Ripley has started looking into it).
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
project.org] On Behalf Of Greg Snow
Sent: Wednesday, November 12, 2008 1:07 PM
To: Effie Greathouse; r-help@r-project.org
Subject: Re: [R] different results with plot.lm vs. plot.lm(which=c(2))
The same thing is affecting plot 2 as well. Basically in the code
there is line early on:
r <- residuals(x)
which gets the residuals and by default for glm models those are the
deviance residuals.
A bit latter is the code (after some optional modifications to r):
if (any(show[2:3])) {
ylab23 <- if (isGlm)
"Std. deviance resid."
else "Standardized residuals"
r.w <- if (is.null(w))
r
else sqrt(w) * r
}
Which results in r.w being the residuals (or resids times the square
root of w) if either plot 2 or 3 is requested.
The next batch of code is:
if (show[5]) {
ylab5 <- if (isGlm)
"Std. Pearson resid."
else "Standardized residuals"
r.w <- residuals(x, "pearson")
if (!is.null(w))
r.w <- r.w[wind]
}
Which changes r.w to the pearson residuls if plot 5 is requested, it
also sets the label to use for plot 5, but does not change the label
(ylab23) to be used for plots 2 and 3 and so they still are labeled as
deviance residuals.
The r.w variable is used latter in the code for plots 2, 3, and 5 (and
maybe others). Unless there is something else in between the above
code and where r.w is used that fixes this (I did not see anything, but
did not do more than skim the code) then there is a clear bug in that
the residuals being used is inconsistent and mis-labeled in some cases.
If anyone on R-core is paying attention to this thread, then the
problem may already be in the process of being fixed (or it may already
be fixed, check r-patched, the above is based on R 2.8.0 (2008-10-20)),
I've had a couple of cases where the fix was in place before I could
submit a bug report. If we don't see any indication of it being fixed
in the next couple of days, then one of us should submit a bug report.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
project.org] On Behalf Of Effie Greathouse
Sent: Wednesday, November 12, 2008 12:15 PM
To: r-help@r-project.org
Subject: Re: [R] different results with plot.lm vs.
plot.lm(which=c(2))
There's also a big difference in plot 2 (Normal Q-Q) in my real data,
but I
don't see a real difference in plot 2 for the example I sent, except
that
some of the outlier labels are different. Would the residuals being
plotted
likely be the cause of the difference in plot 2 as well? Dr. Snow,
would
you want to be able to see the plot 2 graphs for my real data? I
don't
know
how to save plot 2 when I'm clicking through the plot(model) graphs,
so
I
can't just send it to you. Thanks for checking this out Dr. Snow!
On Wed, Nov 12, 2008 at 11:05 AM, Greg Snow <[EMAIL PROTECTED]>
wrote:
From a quick look at the code it looks like when you ask for plot
number 5
(included in default when 'which' is not specified), then the
deviance
residuals are replaced by the pearson residuals to be used in later
computations. So the difference that you are seeing is that one of
the
plots is based on deviance residuals and the other on pearson
residuls.
It seems that there is a bug here in that, at a minimum, the label
should
be changed to indicate which residuals were actually used, or the
code
changed to continue to use the deviance residuals for plot 3 even
when plot
5 is requested.
Does anyone else see something that I missed in how the residuals
are
replaced and used?
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
project.org] On Behalf Of Effie Greathouse
Sent: Wednesday, November 12, 2008 11:16 AM
To: r-help@r-project.org
Subject: Re: [R] different results with plot.lm vs.
plot.lm(which=c(2))
Hi Dr. Ripley--Sorry for the repost everybody. The original
message I
sent
never showed up in my inbox, so I thought it didn't get sent to
the
list.
I'm running R 2.8.0, installed from a pre-compiled version, on
Windows
XP.
When I type Sys.getlocale() at the R prompt, it returns:
"LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Here's an example:
bob <- seq(1:100)
bob2 <- rgamma(100, 2, 1)*10+bob
model<-glm(bob2 ~ bob, family=Gamma)
Then enter:
plot(model, which=c(3))
to get the Scale-Location graph
Then compare it to the Scale-Location graph when you run the
following
command and page through to the 3rd graph:
plot(model)
When I do this, I get different results -- some of the high
values
are
different on each plot. On my real data the difference is more
severe
than
in this randomly generated example. I'd be happy to supply my
real
data and
R code if this smaller example isn't sufficient. Thank you for
any
help!!
On Wed, Nov 12, 2008 at 9:43 AM, Prof Brian Ripley
<[EMAIL PROTECTED]>wrote:
Instead of re-posting the same message, please study the
posting
guide and
supply the information asked for, including a reproducible
example.
There is
no way we can help you unless you help us to help you.
On Wed, 12 Nov 2008, Effie Greathouse wrote:
I am running GLM models using the gamma family. For example:
model <-glm(y ~ x, family=Gamma(link="identity"))
I am getting different results for the normal Q-Q plot and the
Scale-Location plot if I run the diagnostic plots without
specifying
the
plot vs. if I specify the plot ... e.g., "plot(model)" gives
me
a
different
Normal Q-Q graph than "plot(model, which=c(2))". The former
gives
data
points distributed in a quadratic pattern, while the latter
gives
data
points more or less along the 1:1 line. Shouldn't these two
commands be
giving me the same exact graphs? I have read the
documentation
on
plot.lm
and searched the help archives, but I am still learning GLM's
and
I'm not
very familiar with understanding diagnostic plots for GLM's,
so
any
help
would be much appreciated!
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-
project.org/posting-guide.html>
<http://www.r-
> project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible
code.
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics,
http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-
project.org/posting-
<http://www.r-project.org/posting->
guide.html
and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.