Re: [R] glmnet on Autopilot

David Winsemius Wed, 17 Jul 2013 19:47:43 -0700


On Jul 17, 2013, at 5:26 PM, Axel Urbiz wrote:

Dear List,

I'm running simulations using the glmnet package. I need to use an
'automated' method for model selection at each iteration of thesimulation.
The cv.glmnet function in the same package is handy for that purpose.
However, in my simulation I have p >> N, and in some cases theselectedmodel from cv.glmet is essentially shrinking all coefficients tozero. In
this case, the prediction for each instance equals the average of the
response variable. A reproducible example is shown below.
Is there a reasonable way to prevent this from happening in asimulationsetting with glmnet? That is, I'd like the selected model to give mesome
useful predictions.

I'd like to expose the premise of the request to criticism. Reportingthe sample mean in cases where no preditctors meet the criteria forsignificance under penalsization IS an informative response underconditions of simulation. The simulated result is telling you that insome data situations of modest size assess under a penalized processwill not deliver a "significant" result. Why does this bother yu\ou?The number of such messages would seem to be one measure of the powerof the method, although other departures from the "true" result wouldalso be substracted from teh count of runs.

If you choose to ignore the "evidence", then I "predict" that youare also one who chooses to throw out outliers. Both would have asimilar effect of inflating measures of significance at the expense offideltity to the data. If you want to vary the parameter, then varythe penalization and determine the effect of that hyper-parameter.


David Winsemius

I've tested using alternative loss measures (type.measure argument),but

none is satisfactory in all cases.

This question is not necessarily R related (so sorry for that): when

comparing glmnet with other models in terms of predictive accuracy,is it

fair to make the comparison including those cases in which the `best'
cv.glmnet can do in an automated setting is pred = avg(response)?

library(glmnet)
set.seed(1010)
n=100;p=3000
nzc=trunc(p/10)
x=matrix(rnorm(n*p),n,p)
beta=rnorm(nzc)
fx= x[,seq(nzc)] %*% beta
eps=rnorm(n)*5
y=drop(fx+eps)
px=exp(fx)
px=px/(1+px)
ly=rbinom(n=length(px),prob=px,size=1)

fit.net <- cv.glmnet(x,
                    ly,
                    family = "binomial",
                    alpha = 1, # lasso penalty
                    type.measure = "deviance",
                    standardize = FALSE,
                    intercept = FALSE,
                    nfolds = 10,
                    keep = FALSE)

plot(fit.net)
log(fit.net$lambda.1se)
pred <- predict(fit.net, x,
               type = "response", s = "lambda.1se")
all(coef(fit.net) == 0)
all(pred ==0.5)


Thanks in advance for your thoughts.

Regards,
Lars.

        [[alternative HTML version deleted]]

No problems with this posting for my mail client but you should learnto use the facilities in gmail to send palin text. Yhey are easy to fnd.


--
David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glmnet on Autopilot

Reply via email to