Re: [R] LASSO: glmpath and cv.glmpath

Steve Lianoglou Fri, 21 Aug 2009 08:17:51 -0700

Hi,

On Aug 21, 2009, at 9:47 AM, Peter Schüffler wrote:

Hi,
perhaps you can help me to find out, how to find the best Lambda ina LASSO-model.
I have a feature selection problem with 150 proteins potentiallypredicting Cancer or Noncancer. With a lasso model
fit.glm <- glmpath(x=as.matrix(X), y=target, family="binomial")
(target is 0, 1 <- Cancer non cancer, X the proteins, numerical inexpression), I get following path (PICTURE 1)One of these models is the best, according to its crossvalidation(PICTURE 2), the red line corresponds to the best crossvalidation.Its produced by
cv <- cv.glmpath(x=as.matrix(X), y=unclass(T)-1, family="binomial",type ="response", plot.it=TRUE, se=TRUE)abline(v= cv$fraction[max(which(cv$cv.error==min(cv$cv.error)))],col="red", lty=2, lwd=3)
Does anyone know, how to conclude from the Normfraction in PICTURE 2to the corresponding model in PICTURE 1? What is the best model?Which coefficients does it have? I can only see the best model'scross validation error, but not the actual model. How to see it?

None of your pictures came through, so I'm not sure exactly whatyou're trying to point out, but in general the cross validation willhelp you find the best value for lambda for the lasso. I think it'sthe value of lambda that you'll use for your downstream analysis.

I haven't used the glmpath package, but I have been using the glmnetpackage which is also by Hastie, newer, and I believe covers the sameuse cases as the glmpath library (though, to be honest, I'm not quitefamiliar w/ the cox proportions hazard model). Perhaps you might wantto look into it.

Anyway, speaking from my experience w/ the glmnet packatge, you mighttry this:

1. Determine the best value of lambda using CV. I guess you can useMSE or R^2 as you see fit as your yardstick of "best."

2. Train a model over all of your data and ask it for the coefficientsat the given value of lambda from 1.


3. See which proteins have non-zero coefficients.

<tongue-in-cheek>

4. Divine a biological story that is explained by your statisticalfindings


4. Publish.
</tongue-in-cheek>

I guess there are many ways to do model selection, and I'm not sureit's clear how effective they are (which isn't to say that youshouldn't don't do them)[1] ... you might want to further divide yourdata into training/tuning/test (somewhere between steps 1 and 2) asanother means of scoring models.


HTH,
-steve

[1] http://hunch.net/?p=29

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LASSO: glmpath and cv.glmpath

Reply via email to