On Apr 14, 2012, at 14:40 , Berend Hasselman wrote: > > On 13-04-2012, at 22:20, Gene Leynes wrote: > >> I can't figure out why this is returning an NA for the slope in one case, >> but not in the other. >> >> I can tell that R thinks the first case is singular, but why isn't the >> second? >> >> ## Define X and Y >> ## There are two versions of x >> ## 1) "as is" >> ## 2) shifted to start at 0 >> y = c(58, 57, 57, 279, 252, 851, 45, 87, 47) >> x1 = c(1334009411.437, 1334009411.437, 1334009411.437, 1334009469.297, >> 1334009469.297, 1334009469.297, 1334009485.697, 1334009485.697, >> 1334009485.697) >> x2 = x1 - min(x1) >> >> ## Why doesn't the LM model work for the "as is" x? >> lm(y~x1) >> lm(y~x2) > > > With your data the matrix t(X)%*%X is extremely ill-conditioned for the case > with x1. > See > > http://en.wikipedia.org/wiki/Condition_number > http://mathworld.wolfram.com/ConditionNumber.html > http://en.wikipedia.org/wiki/Numerical_methods_for_linear_least_squares > > You can check it with > > makeXX <- function(x) { > matrix(data=c(x,rep(1,length(x))),nrow=length(x), byrow=FALSE) > } > > X1 <- makeXX(x1) > (XTX1 <- t(X1) %*% X1) > svd(XTX1) > > and similar for x2.
<lecture on numerical linear algebra> lm() is actually a bit smarter than to use the textbook formula (X'X)^{-1}X'Y, it is internally based on a QR decomposition which is numerically far more stable. What it does is effectively to orthogonalize the columns of X successively. > x <- cbind(1, x1) > qr(x) $qr x1 [1,] -3.0000000 -4.002028e+09 [2,] 0.3333333 9.555777e+01 [3,] 0.3333333 3.456548e-01 [4,] 0.3333333 -2.598428e-01 [5,] 0.3333333 -2.598428e-01 [6,] 0.3333333 -2.598428e-01 [7,] 0.3333333 -4.314668e-01 [8,] 0.3333333 -4.314668e-01 [9,] 0.3333333 -4.314668e-01 $rank [1] 1 $qraux [1] 1.333333 1.345655 $pivot [1] 1 2 attr(,"class") [1] "qr" However, notice the rank of 1. That's because it wants to protect the user against unwittingly plugging in a singular design matrix, so when the algorithm encounters a variable that looks like it is getting reduced to zero by after orthogonalization, it basically throws it out. You can defeat this mechanism by setting the tol= argument sufficiently low. In fact, you can do the same with lm itself: > lm(y ~ x1, tol = 0) ... (Intercept) x1 -2.474e+09 1.854e+00 Notice that the slope is consistent with the > lm(y ~ x2) ... (Intercept) x2 110.884 1.854 In contrast if you try the textbook way: > solve(crossprod(x), crossprod(x,y)) Error in solve.default(crossprod(x), crossprod(x, y)) : system is computationally singular: reciprocal condition number = 2.67813e-34 Ah, well, let's try and defeat that: > solve(crossprod(x), crossprod(x,y), tol=0) [,1] -2.959385e+09 x1 2.218414e+00 Which, as you'll notice is, er, somewhat off the mark. </lecture on numerical linear algebra> -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.