[Rd] lm() gives different results to lm.ridge() and SPSS

Nick Brown Thu, 04 May 2017 09:15:11 -0700

Hallo, 

I hope I am posting to the right place. I was advised to try this list by Ben 
Bolker (https://twitter.com/bolkerb/status/859909918446497795). I also posted 
this question to StackOverflow 
(http://stackoverflow.com/questions/43771269/lm-gives-different-results-from-lm-ridgelambda-0).
 I am a relative newcomer to R, but I wrote my first program in 1975 and have 
been paid to program in about 15 different languages, so I have some general 
background knowledge.



I have a regression from which I extract the coefficients like this: 
lm(y ~ x1 * x2, data=ds)$coef 
That gives: x1=0.40, x2=0.37, x1*x2=0.09 



When I do the same regression in SPSS, I get: 
beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. 
So the main effects are in agreement, but there is quite a difference in the 
coefficient for the interaction. 


X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my 
idea, but it got published), so there is quite possibly something going on with 
collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of 
where the problems are occurring. 


The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge penalty) 
and check we get the same results as with lm(): 
lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef 
x1=0.40, x2=0.37, x1*x2=0.14 
So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the 
default, so it can be omitted; I can alternate between including or deleting 
".ridge" in the function call, and watch the coefficient for the interaction 
change.) 



What seems slightly strange to me here is that I assumed that lm.ridge() just 
piggybacks on lm() anyway, so in the specific case where lambda=0 and there is 
no "ridging" to do, I'd expect exactly the same results. 


Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex will 
not be easy to make, but I can share the data via Dropbox or something if that 
would help. 



I appreciate that when there is strong collinearity then all bets are off in 
terms of what the betas mean, but I would really expect lm() and lm.ridge() to 
give the same results. (I would be happy to ignore SPSS, but for the moment 
it's part of the majority!) 



Thanks for reading, 
Nick 


        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] lm() gives different results to lm.ridge() and SPSS

Reply via email to