Re: [R] different results with plot.lm vs. plot.lm(which=c(2))

Prof Brian Ripley Thu, 13 Nov 2008 05:45:20 -0800

This was AFAICS a bug introduced in 2.7.1 by

    o   plot(<glm>, which=5) uses more correct Cook's distance contours;
        (fix to fix to PR#9316).


Thanks to Greg for the diagnosis.
Will be fixed in R-patched later today.


On Wed, 12 Nov 2008, Greg Snow wrote:

Just a clarification on one of my statements below.  I realize on rereading my 
statement on R-core paying attention that it could be interpreted as a possible 
criticism.  That is not how it was intended, rather that I have seen cases in 
the past where a discussion confirms that something is a bug and a member of 
R-core already has the fix started or in place before a formal bug report could 
be sent and so we should not send a bug report on this unless it was clear that 
one was needed (which it is not, I received an e-mail off line that Prof. 
Ripley has started looking into it).


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
project.org] On Behalf Of Greg Snow
Sent: Wednesday, November 12, 2008 1:07 PM
To: Effie Greathouse; r-help@r-project.org
Subject: Re: [R] different results with plot.lm vs. plot.lm(which=c(2))

The same thing is affecting plot 2 as well.  Basically in the code
there is line early on:

r <- residuals(x)

which gets the residuals and by default for glm models those are the
deviance residuals.

A bit latter is the code (after some optional modifications to r):

    if (any(show[2:3])) {
        ylab23 <- if (isGlm)
            "Std. deviance resid."
        else "Standardized residuals"
        r.w <- if (is.null(w))
            r
        else sqrt(w) * r
    }

Which results in r.w being the residuals (or resids times the square
root of w) if either plot 2 or 3 is requested.

The next batch of code is:

    if (show[5]) {
        ylab5 <- if (isGlm)
            "Std. Pearson resid."
        else "Standardized residuals"
        r.w <- residuals(x, "pearson")
        if (!is.null(w))
            r.w <- r.w[wind]
    }

Which changes r.w to the pearson residuls if plot 5 is requested, it
also sets the label to use for plot 5, but does not change the label
(ylab23) to be used for plots 2 and 3 and so they still are labeled as
deviance residuals.

The r.w variable is used latter in the code for plots 2, 3, and 5 (and
maybe others).  Unless there is something else in between the above
code and where r.w is used that fixes this (I did not see anything, but
did not do more than skim the code) then there is a clear bug in that
the residuals being used is inconsistent and mis-labeled in some cases.

If anyone on R-core is paying attention to this thread, then the
problem may already be in the process of being fixed (or it may already
be fixed, check r-patched, the above is based on R 2.8.0 (2008-10-20)),
I've had a couple of cases where the fix was in place before I could
submit a bug report.  If we don't see any indication of it being fixed
in the next couple of days, then one of us should submit a bug report.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
project.org] On Behalf Of Effie Greathouse
Sent: Wednesday, November 12, 2008 12:15 PM
To: r-help@r-project.org
Subject: Re: [R] different results with plot.lm vs.

plot.lm(which=c(2))


There's also a big difference in plot 2 (Normal Q-Q) in my real data,
but I
don't see a real difference in plot 2 for the example I sent, except
that
some of the outlier labels are different.  Would the residuals being
plotted
likely be the cause of the difference in plot 2 as well?  Dr. Snow,
would
you want to be able to see the plot 2 graphs for my real data?  I

don't

know
how to save plot 2 when I'm clicking through the plot(model) graphs,

so

I
can't just send it to you.  Thanks for checking this out Dr. Snow!

On Wed, Nov 12, 2008 at 11:05 AM, Greg Snow <[EMAIL PROTECTED]>
wrote:

From a quick look at the code it looks like when you ask for plot

number 5

(included in default when 'which' is not specified), then the

deviance

residuals are replaced by the pearson residuals to be used in later
computations.  So the difference that you are seeing is that one of

the

plots is based on deviance residuals and the other on pearson

residuls.


It seems that there is a bug here in that, at a minimum, the label

should

be changed to indicate which residuals were actually used, or the

code

changed to continue to use the deviance residuals for plot 3 even

when plot

5 is requested.

Does anyone else see something that I missed in how the residuals

are

replaced and used?

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
project.org] On Behalf Of Effie Greathouse
Sent: Wednesday, November 12, 2008 11:16 AM
To: r-help@r-project.org
Subject: Re: [R] different results with plot.lm vs.

plot.lm(which=c(2))


Hi Dr. Ripley--Sorry for the repost everybody.  The original

message I

sent
never showed up in my inbox, so I thought it didn't get sent to

the

list.

I'm running R 2.8.0, installed from a pre-compiled version, on

Windows

XP.
When I type Sys.getlocale() at the R prompt, it returns:
 "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Here's an example:
bob <- seq(1:100)
bob2 <- rgamma(100, 2, 1)*10+bob
model<-glm(bob2 ~ bob, family=Gamma)

Then enter:
plot(model, which=c(3))
to get the Scale-Location graph

Then compare it to the Scale-Location graph when you run the

following

command and page through to the 3rd graph:
plot(model)

When I do this, I get different results -- some of the high

values

are

different on each plot.  On my real data the difference is more

severe

than
in this randomly generated example.  I'd be happy to supply my

real

data and
R code if this smaller example isn't sufficient.  Thank you for

any

help!!


On Wed, Nov 12, 2008 at 9:43 AM, Prof Brian Ripley
<[EMAIL PROTECTED]>wrote:

Instead of re-posting the same message, please study the

posting

guide and

supply the information asked for, including a reproducible

example.

There is

no way we can help you unless you help us to help you.


On Wed, 12 Nov 2008, Effie Greathouse wrote:

  I am running GLM models using the gamma family.  For example:

model <-glm(y ~ x, family=Gamma(link="identity"))

I am getting different results for the normal Q-Q plot and the
Scale-Location plot if I run the diagnostic plots without

specifying

the

plot vs. if I specify the plot ... e.g., "plot(model)" gives

me

different
Normal Q-Q graph than "plot(model, which=c(2))".  The former

gives

data

points distributed in a quadratic pattern, while the latter

gives

data

points more or less along the 1:1 line.  Shouldn't these two

commands be

giving me the same exact graphs?  I have read the

documentation

on

plot.lm

and searched the help archives, but I am still learning GLM's

and

I'm not

very familiar with understanding diagnostic plots for GLM's,

so

any

help

would be much appreciated!

       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-

project.org/posting-guide.html>

<http://www.r-
> project.org/posting-guide.html>

and provide commented, minimal, self-contained, reproducible

code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,

http://www.stats.ox.ac.uk/~ripley/

University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-

project.org/posting-

<http://www.r-project.org/posting->

guide.html
and provide commented, minimal, self-contained, reproducible

code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different results with plot.lm vs. plot.lm(which=c(2))

Reply via email to