Hi, ==================== Reproducible example: ====================
data(Animals, package="MASS") # interesting dataset # Run model lm1 <- lm(log10(body)~log10(brain), data=Animals) # Setup 2x2 graphics device par(mfrow=c(2,2)) # Plot diagnostics, label the two most "extreme" points based on magnitude of residuals plot(lm1, id.n=2) ============================== Explanation of resulting plots: ============================== Notice that the one of the two extreme points corresponding to the two largest dinosaurs are labelled unintuitively, or counter to what is stated in the documentation for the "label.pos" argument: ?plot.lm label.pos: positioning of labels, for the left half and right half of the graph respectively, for plots 1-3. The default value for this argument is c(4,2), where 4 means "to the right of" and 2 means "to the left of" as stated in the help page for text (see the 'pos' argument). The Q-Q plot positions the label for Dipliodocus "to the right", but clearly it should be placed "to the left" since it is clearly on the right half of the graph. Similarly for the Leverage plot, the label for Brachiosaurus is placed "to the left" when clearly it should be placed "to the right". ==================================== Reason for error and possible patch: ==================================== The fix is hard to explain, because changes are required in many places. On line 85 (or thereabouts) of the plot.lm function, there is a function called text.id which does the labelling: text.id <- function(x, y, ind, adj.x = TRUE) { labpos <- if (adj.x) label.pos[1 + as.numeric(x > mean(range(x)))] else 3 text(x, y, labels.id[ind], cex = cex.id, xpd = TRUE, pos = labpos, offset = 0.25) } This text.id function is called for plots corresponding to which==1 (lines 126:128), which==2 (line 145), for example: text.id(qq$x[show.rs], qq$y[show.rs], show.rs) which==3 (line 163), which==4 (line 180), which==5 (lines 270:272), and which==6 (line 312). I believe the text.id function should be changed to: text.id <- function(x, y, ind, adj.x = TRUE) { labpos <- if (adj.x) label.pos[1 + as.numeric(x[ind] > mean(range(x)))] else 3 text(x[ind], y[ind], labels.id[ind], cex = cex.id, xpd = TRUE, pos = labpos, offset = 0.25) } And the repeated calls to this function are changed so that the choice of position is based on whether the extreme points are greater than the mean of the range of ALL the data points, not just the extreme ones as it is currently doing. For example, at line 145 for the Q-Q plot (which==2), the [show.rs] index should be removed in the first two arguments, so the code should be: text.id(qq$x, qq$y, show.rs) and similar changes are required for plots 3, 4, and 5. For plots 1 and 6, the following changes are needed: Lines 126:128 (which==1) y.id <- r # delete [show.r] y.id[y.id < 0] <- y.id[y.id < 0] - strheight(" ")/3 text.id(yh, y.id, show.r) # delete [show.r] Lines 270:272 (which==6) y.id <- rsp # delete [show.rsp] y.id[y.id < 0] <- y.id[y.id < 0] - strheight(" ")/3 text.id(xx, y.id, show.rsp) # delete [show.rsp] I tested these changes and they seem to work without breaking anything. If you want me to make a patch, then I can try. But I thought that these changes were quite significant and better left to the experts. Hope that all makes sense. -- Edward McNeil Assistant Professor, Epidemiology Unit, Prince of Songkla University, Hat Yai, Thailand ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel