peter dalgaard <pdalgd <at> gmail.com> writes: > > > > On 28 Mar 2015, at 00:32 , RiGui <raluca.gui <at> business.uzh.ch> wrote: > > > > Hello everybody, > > > > I have encountered the following problem with lm(): > > > > When running lm() with a regressor close to zero - > of the order e-10, the > > value of the estimate is of huge absolute value , of order millions. > > > > However, if I write the formula of the OLS estimator, > in matrix notation: > > pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the > > estimate has value 0.
How do you know this answer is "correct"? > > here is the code: > > > > y <- rnorm(n_obs, 10,2.89) > > x1 <- rnorm(n_obs, 0.00000000000001235657,0.000000000000000045) > > x2 <- rnorm(n_obs, 10,3.21) > > X <- cbind(x1,x2) > > > > bFE <- lm(y ~ x1 + x2) > > bFE > > > > bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y > > bOLS > > > > > > Note: I am applying a deviation from the > mean projection to the data, that > > is why I have some regressors with such small values. > > > > Thank you for any help! > > > > Raluca Is there a reason you can't scale your regressors? > > > Example not reproducible: > I agree that the OP's question was not reproducible, but it's not too hard to make it reproducible. I bothered to use library("sos"); findFn("pseudoinverse") to find pseudoinverse() in corpcor: It is true that we get estimates with very large magnitudes, but their set.seed(101) n_obs <- 1000 y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(x1,x2) bFE <- lm(y ~ x1 + x2) bFE coef(summary(bFE)) Estimate Std. Error t value Pr(>|t|) (Intercept) 1.155959e+01 2.312956e+01 0.49977541 0.6173435 x1 -1.658420e+14 1.872598e+15 -0.08856254 0.9294474 x2 3.797342e-02 2.813000e-02 1.34992593 0.1773461 library("corpcor") bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y bOLS [,1] [1,] 9.807664e-16 [2,] 8.880273e-01 And if we scale the predictor: bFE2 <- lm(y ~ I(1e14*x1) + x2) coef(summary(bFE2)) Estimate Std. Error t value Pr(>|t|) (Intercept) 11.55958731 23.12956 0.49977541 0.6173435 I(1e+14 * x1) -1.65842037 18.72598 -0.08856254 0.9294474 x2 0.03797342 0.02813 1.34992593 0.1773461 bOLS stays constant. To be honest, I haven't thought about this enough to see which answer is actually correct, although I suspect the problem is in bOLS, since the numerical methods (unlike the brute-force pseudoinverse method given here) behind lm have been carefully considered for numerical stability. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.