Hi,
Here is (I hope) all the relevant output from R.
> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA,
> na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 >
> lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA
> ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
-0.3962254 -0.3636026 -0.1425772 ##
This is what I thought was the problem originally. :-)
> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) (Intercept)
> ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
0.07342198 -0.39650356 -0.36569488
-0.09435788 > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA,
data=s1)) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
0.07342198 -0.39650356 -0.36569488
-0.09435788 The equivalent from SPSS is attached. The unstandardized
coefficients in SPSS look nothing like those in R. The standardized
coefficients in SPSS match the lm.ridge()$coef numbers very closely indeed,
suggesting that the same algorithm may be in use.
I have put the dataset file, which is the untouched original I received from
the authors, in this Dropbox folder:
https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. You
can read it into R with this code (one variable needs to be standardized and
centered; everything else is already in the file):
s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE)
s1$ZDEPRESSION <- scale(s1$DEPRESSION)
Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM
will want to fix it quickly (ha ha ha).
Nick
----- Original Message -----
From: "peter dalgaard" <[email protected]>
To: "Nick Brown" <[email protected]>
Cc: "Simon Bonner" <[email protected]>, [email protected]
Sent: Friday, 5 May, 2017 10:02:10 AM
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
I asked you before, but in case you missed it: Are you looking at the right
place in SPSS output?
The UNstandardized coefficients should be comparable to R, i.e. the "B" column,
not "Beta".
-pd
> On 5 May 2017, at 01:58 , Nick Brown <[email protected]> wrote:
>
> Hi Simon,
>
> Yes, if I uses coefficients() I get the same results for lm() and lm.ridge().
> So that's consistent, at least.
>
> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the
> value from SPSS to 5dp, which is an interesting coincidence if these numbers
> have no particular external meaning in lm.ridge().
>
> Kind regards,
> Nick
>
> ----- Original Message -----
>
> From: "Simon Bonner" <[email protected]>
> To: "Nick Brown" <[email protected]>, [email protected]
> Sent: Thursday, 4 May, 2017 7:07:33 PM
> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS
>
> Hi Nick,
>
> I think that the problem here is your use of $coef to extract the
> coefficients of the ridge regression. The help for lm.ridge states that coef
> is a "matrix of coefficients, one row for each value of lambda. Note that
> these are not on the original scale and are for use by the coef method."
>
> I ran a small test with simulated data, code is copied below, and indeed the
> output from lm.ridge differs depending on whether the coefficients are
> accessed via $coef or via the coefficients() function. The latter does
> produce results that match the output from lm.
>
> I hope that helps.
>
> Cheers,
>
> Simon
>
> ## Load packages
> library(MASS)
>
> ## Set seed
> set.seed(8888)
>
> ## Set parameters
> n <- 100
> beta <- c(1,0,1)
> sigma <- .5
> rho <- .75
>
> ## Simulate correlated covariates
> Sigma <- matrix(c(1,rho,rho,1),ncol=2)
> X <- mvrnorm(n,c(0,0),Sigma=Sigma)
>
> ## Simulate data
> mu <- beta[1] + X %*% beta[-1]
> y <- rnorm(n,mu,sigma)
>
> ## Fit model with lm()
> fit1 <- lm(y ~ X)
>
> ## Fit model with lm.ridge()
> fit2 <- lm.ridge(y ~ X)
>
> ## Compare coefficients
> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2))
>
> [,1] [,2] [,3]
> (Intercept) 0.99276001 NA 0.99276001
> X1 -0.03980772 -0.04282391 -0.03980772
> X2 1.11167179 1.06200476 1.11167179
>
> --
>
> Simon Bonner
> Assistant Professor of Environmetrics/ Director MMASc
> Department of Statistical and Actuarial Sciences/Department of Biology
> University of Western Ontario
>
> Office: Western Science Centre rm 276
>
> Email: [email protected] | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813
> Twitter: @bonnerstatslab | Website:
> http://simon.bonners.ca/bonner-lab/wpblog/
>
>> -----Original Message-----
>> From: R-devel [mailto:[email protected]] On Behalf Of Nick
>> Brown
>> Sent: May 4, 2017 10:29 AM
>> To: [email protected]
>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS
>>
>> Hallo,
>>
>> I hope I am posting to the right place. I was advised to try this list by
>> Ben Bolker
>> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this
>> question to StackOverflow
>> (http://stackoverflow.com/questions/43771269/lm-gives-different-results-
>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first
>> program in 1975 and have been paid to program in about 15 different
>> languages, so I have some general background knowledge.
>>
>>
>> I have a regression from which I extract the coefficients like this:
>> lm(y ~ x1 * x2, data=ds)$coef
>> That gives: x1=0.40, x2=0.37, x1*x2=0.09
>>
>>
>>
>> When I do the same regression in SPSS, I get:
>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14.
>> So the main effects are in agreement, but there is quite a difference in the
>> coefficient for the interaction.
>>
>>
>> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my
>> idea, but it got published), so there is quite possibly something going on
>> with
>> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of
>> where
>> the problems are occurring.
>>
>>
>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
>> penalty) and
>> check we get the same results as with lm():
>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef
>> x1=0.40, x2=0.37, x1*x2=0.14
>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is
>> the
>> default, so it can be omitted; I can alternate between including or deleting
>> ".ridge" in the function call, and watch the coefficient for the interaction
>> change.)
>>
>>
>>
>> What seems slightly strange to me here is that I assumed that lm.ridge()
>> just
>> piggybacks on lm() anyway, so in the specific case where lambda=0 and there
>> is no "ridging" to do, I'd expect exactly the same results.
>>
>>
>> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex
>> will
>> not be easy to make, but I can share the data via Dropbox or something if
>> that
>> would help.
>>
>>
>>
>> I appreciate that when there is strong collinearity then all bets are off in
>> terms
>> of what the betas mean, but I would really expect lm() and lm.ridge() to
>> give
>> the same results. (I would be happy to ignore SPSS, but for the moment it's
>> part of the majority!)
>>
>>
>>
>> Thanks for reading,
>> Nick
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [email protected] Priv: [email protected]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel