Hi,

====================
Reproducible example:
====================

data(Animals, package="MASS") # interesting dataset

# Run model
lm1 <- lm(log10(body)~log10(brain), data=Animals)

# Setup 2x2 graphics device
par(mfrow=c(2,2))

# Plot diagnostics, label the two most "extreme" points based on magnitude of 
residuals
plot(lm1, id.n=2)

==============================
Explanation of resulting plots:
==============================
Notice that the one of the two extreme points corresponding to the two largest 
dinosaurs
are labelled unintuitively, or counter to what is stated in the documentation 
for the
"label.pos" argument:

?plot.lm
label.pos: positioning of labels, for the left half and right half of the graph
respectively, for plots 1-3.

The default value for this argument is c(4,2), where 4 means "to the right of" 
and 2
means "to the left of" as stated in the help page for text (see the 'pos' 
argument).

The Q-Q plot positions the label for Dipliodocus "to the right", but clearly it 
should
be placed "to the left" since it is clearly on the right half of the graph. 
Similarly
for the Leverage plot, the label for Brachiosaurus is placed "to the left" when 
clearly
it should be placed "to the right".

====================================
Reason for error and possible patch:
====================================
The fix is hard to explain, because changes are required in many places.

On line 85 (or thereabouts) of the plot.lm function, there is a function called 
text.id
which does the labelling:

text.id <- function(x, y, ind, adj.x = TRUE) {
            labpos <- if (adj.x)
                label.pos[1 + as.numeric(x > mean(range(x)))]
            else 3
            text(x, y, labels.id[ind], cex = cex.id, xpd = TRUE,
                pos = labpos, offset = 0.25)
        }

This text.id function is called for plots corresponding to which==1 (lines 
126:128),
which==2 (line 145), for example:

      text.id(qq$x[show.rs], qq$y[show.rs], show.rs)

which==3 (line 163), which==4 (line 180), which==5 (lines 270:272), and 
which==6 (line
312).

I believe the text.id function should be changed to:

text.id <- function(x, y, ind, adj.x = TRUE) {
      labpos <- if (adj.x)
        label.pos[1 + as.numeric(x[ind] > mean(range(x)))]
      else 3
      text(x[ind], y[ind], labels.id[ind], cex = cex.id, xpd = TRUE,
           pos = labpos, offset = 0.25)
    }

And the repeated calls to this function are changed so that the choice of 
position is
based on whether the extreme points are greater than the mean of the range of 
ALL the
data points, not just the extreme ones as it is currently doing. For example, 
at line
145 for the Q-Q plot (which==2), the [show.rs] index should be removed in the 
first two
arguments, so the code should be:

      text.id(qq$x, qq$y, show.rs)

and similar changes are required for plots 3, 4, and 5. For plots 1 and 6, the 
following
changes are needed:

Lines 126:128 (which==1)
      y.id <- r # delete [show.r]
      y.id[y.id < 0] <- y.id[y.id < 0] - strheight(" ")/3
      text.id(yh, y.id, show.r) # delete [show.r]

Lines 270:272 (which==6)
        y.id <- rsp # delete [show.rsp]
        y.id[y.id < 0] <- y.id[y.id < 0] - strheight(" ")/3
        text.id(xx, y.id, show.rsp) # delete [show.rsp]

I tested these changes and they seem to work without breaking anything. If you 
want me
to make a patch, then I can try. But I thought that these changes were quite 
significant
and better left to the experts.

Hope that all makes sense.
-- 
Edward McNeil
Assistant Professor,
Epidemiology Unit,
Prince of Songkla University,
Hat Yai,
Thailand

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to