Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-04 Thread Simon Bonner
Hi Nick,

I think that the problem here is your use of $coef to extract the coefficients 
of the ridge regression. The help for lm.ridge states that coef is a "matrix of 
coefficients, one row for each value of lambda. Note that these are not on the 
original scale and are for use by the coef method."

I ran a small test with simulated data, code is copied below, and indeed the 
output from lm.ridge differs depending on whether the coefficients are accessed 
via $coef or via the coefficients() function. The latter does produce results 
that match the output from lm.

I hope that helps.

Cheers,

Simon

## Load packages
library(MASS)

## Set seed 
set.seed()

## Set parameters
n <- 100
beta <- c(1,0,1)
sigma <- .5
rho <- .75

## Simulate correlated covariates
Sigma <- matrix(c(1,rho,rho,1),ncol=2)
X <- mvrnorm(n,c(0,0),Sigma=Sigma)

## Simulate data
mu <- beta[1] + X %*% beta[-1]
y <- rnorm(n,mu,sigma)

## Fit model with lm()
fit1 <- lm(y ~ X)

## Fit model with lm.ridge()
fit2 <- lm.ridge(y ~ X)

## Compare coefficients
cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2))

   [,1][,2][,3]
(Intercept)  0.99276001  NA  0.99276001
X1  -0.03980772 -0.04282391 -0.03980772
X2   1.11167179  1.06200476  1.11167179

--

Simon Bonner
Assistant Professor of Environmetrics/ Director MMASc 
Department of Statistical and Actuarial Sciences/Department of Biology 
University of Western Ontario

Office: Western Science Centre rm 276

Email: sbonn...@uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813
Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/

> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick
> Brown
> Sent: May 4, 2017 10:29 AM
> To: r-devel@r-project.org
> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS
> 
> Hallo,
> 
> I hope I am posting to the right place. I was advised to try this list by Ben 
> Bolker
> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this
> question to StackOverflow
> (http://stackoverflow.com/questions/43771269/lm-gives-different-results-
> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first
> program in 1975 and have been paid to program in about 15 different
> languages, so I have some general background knowledge.
> 
> 
> I have a regression from which I extract the coefficients like this:
> lm(y ~ x1 * x2, data=ds)$coef
> That gives: x1=0.40, x2=0.37, x1*x2=0.09
> 
> 
> 
> When I do the same regression in SPSS, I get:
> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14.
> So the main effects are in agreement, but there is quite a difference in the
> coefficient for the interaction.
> 
> 
> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my
> idea, but it got published), so there is quite possibly something going on 
> with
> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of 
> where
> the problems are occurring.
> 
> 
> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge 
> penalty) and
> check we get the same results as with lm():
> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef
> x1=0.40, x2=0.37, x1*x2=0.14
> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the
> default, so it can be omitted; I can alternate between including or deleting
> ".ridge" in the function call, and watch the coefficient for the interaction
> change.)
> 
> 
> 
> What seems slightly strange to me here is that I assumed that lm.ridge() just
> piggybacks on lm() anyway, so in the specific case where lambda=0 and there
> is no "ridging" to do, I'd expect exactly the same results.
> 
> 
> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex 
> will
> not be easy to make, but I can share the data via Dropbox or something if that
> would help.
> 
> 
> 
> I appreciate that when there is strong collinearity then all bets are off in 
> terms
> of what the betas mean, but I would really expect lm() and lm.ridge() to give
> the same results. (I would be happy to ignore SPSS, but for the moment it's
> part of the majority!)
> 
> 
> 
> Thanks for reading,
> Nick
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Simon Bonner
In defence of Harma-Jan's original post I would say that there is a difference 
between true convergence and satisfying a convergence criterion. 

In my view the algorithm has not converged. This is a case of quasi-complete 
separate -- there are both successes and failures when x13=0 but only failures 
when x13=1. As a result, the likelihood has no maximum and increases, albeit 
slightly, as the associated coefficient tends to infinity while maximizing over 
the other parameters. The estimate given is not the MLE and the standard error 
is not meaningful because the conditions for convergence of MLEs to their 
asymptotic normal distribution has been violated.

I agree with Joris that someone familiar with logistic regression should be 
able to identify this situation -- though the solution is not as simple as 
throwing out the covariate. Suppose that there had been many failures when 
x13=1, not just 1. The same problem would arise, but x13 is clearly an 
important covariate. Removing it from the analysis is not the thing to do. A 
better solution is to penalize the likelihood or (and I'm showing my true 
colours here) conduct a Bayesian analysis. 

Regarding the statement that the algorithm has converged, perhaps R should say 
more truthfully that the convergence criterion has been satisfied -- but that 
might lead to more confusion. In this case, any convergence criterion will be 
satisfied eventually. If you increase the maximum number of iterations in the C 
implementation then the other convergence criterion will be satisfied and the 
code will say that the algorithm has converged. 

In the end, it's up to the analyst to be aware of the pitfalls and how to 
address them.

Cheers,

Simon

> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Joris Meys
> Sent: July 20, 2017 11:39 AM
> To: Harm-Jan Westra 
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Wrongly converging glm()
> 
> Allow me to chime in. That's an interesting case you present, but as far as 
> I'm
> concerned the algorithm did converge. The estimate of -9.25 has an estimated
> standard error of 72.4, meaning that frequentists would claim the true value
> would lie anywhere between appx. -151 and 132 (CI) and hence the estimate
> from the glm algorithm is perfectly compatible with the one from the C++ code.
> And as the glm algorithm uses a different convergence rule, the algorithm
> rightfully reported it converged. It's not because another algorithm based on
> another rule doesn't converge, that the one glm uses didn't.
> 
> On top of that: In both cases the huge standard error on that estimate clearly
> tells you that the estimate should not be trusted, and the fit is unstable. 
> That's
> to be expected, given the insane inbalance in your data, especially for the 
> 13th
> column. If my students would incorporate that variable in a generalized linear
> model and tries to formulate a conclusion based on that coefficient, they 
> failed
> the exam. So if somebody does this analysis and tries to draw any conclusion
> whatsoever on that estimate, maybe they should leave the analysis to
> somebody who does know what they're doing.
> 
> Cheers
> Joris
> 
> On Thu, Jul 20, 2017 at 5:02 PM, Harm-Jan Westra
>  > wrote:
> 
> > Dear R-core,
> >
> >
> > I have found an edge-case where the glm function falsely concludes
> > that the model has converged. The issue is the following: my data
> > contains a number of covariates, one of these covariates has a very small
> variance.
> > For most of the rows of this covariate, the value is 0, except for one
> > of the rows, where it is 1.
> >
> >
> > The glm function correctly determines the beta and standard error
> > estimates for all other covariates.
> >
> >
> > I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt
> >
> >
> > The model I'm using is very simple:
> >
> >
> > data <- read.table("rtestdata.txt")
> >
> > model <- glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] +
> > data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] +
> > data[,12] + data[,13] + data[,14], family=binomial("logit"))
> >
> > summary(model)
> >
> >
> > You will see that for covariate data[,13], the beta/coefficient
> > estimate is around -9. The number of iterations that has been
> > performed is 8, and model$converged returns TRUE.
> >
> >
> > I've used some alternate logistic regression code in C (
> > https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces
> > identical estimates for the other covariates and comparable deviance
> > values. However, using this C code, I'm seeing that the estimate for
> > data[,13] is around -100 (since I'm allowing a maximum of 100 MLE
> > iterations). There, the conclusion is that the model does not converge.
> >
> >
> > The difference between the two pieces of code is that in R, the glm()
> > function determines convergence of the whole model by measuring the
> > difference between devianc