The train() function in the caret package can help automate this process. Their are 3 package vignettes and a JSS paper with documentation. See
http://cran.r-project.org/web/packages/caret/index.html and www.jstatsoft.org/v28/i05/ If I remember correctly, one of the earlier papers on the lasso by Efron didn't think that cross-validation was the best way of tuning these models (the details escape me). Max 2009/8/21 Steve Lianoglou <mailinglist.honey...@gmail.com>: > Hi, > > On Aug 21, 2009, at 9:47 AM, Peter Schüffler wrote: > >> Hi, >> >> perhaps you can help me to find out, how to find the best Lambda in a >> LASSO-model. >> >> I have a feature selection problem with 150 proteins potentially >> predicting Cancer or Noncancer. With a lasso model >> >> fit.glm <- glmpath(x=as.matrix(X), y=target, family="binomial") >> >> (target is 0, 1 <- Cancer non cancer, X the proteins, numerical in >> expression), I get following path (PICTURE 1) >> One of these models is the best, according to its crossvalidation (PICTURE >> 2), the red line corresponds to the best crossvalidation. Its produced by >> >> cv <- cv.glmpath(x=as.matrix(X), y=unclass(T)-1, family="binomial", type >> ="response", plot.it=TRUE, se=TRUE) >> abline(v= cv$fraction[max(which(cv$cv.error==min(cv$cv.error)))], >> col="red", lty=2, lwd=3) >> >> >> Does anyone know, how to conclude from the Normfraction in PICTURE 2 to >> the corresponding model in PICTURE 1? What is the best model? Which >> coefficients does it have? I can only see the best model's cross validation >> error, but not the actual model. How to see it? > > None of your pictures came through, so I'm not sure exactly what you're > trying to point out, but in general the cross validation will help you find > the best value for lambda for the lasso. I think it's the value of lambda > that you'll use for your downstream analysis. > > I haven't used the glmpath package, but I have been using the glmnet package > which is also by Hastie, newer, and I believe covers the same use cases as > the glmpath library (though, to be honest, I'm not quite familiar w/ the cox > proportions hazard model). Perhaps you might want to look into it. > > Anyway, speaking from my experience w/ the glmnet packatge, you might try > this: > > 1. Determine the best value of lambda using CV. I guess you can use MSE or > R^2 as you see fit as your yardstick of "best." > > 2. Train a model over all of your data and ask it for the coefficients at > the given value of lambda from 1. > > 3. See which proteins have non-zero coefficients. > > <tongue-in-cheek> > 4. Divine a biological story that is explained by your statistical findings > > 4. Publish. > </tongue-in-cheek> > > I guess there are many ways to do model selection, and I'm not sure it's > clear how effective they are (which isn't to say that you shouldn't don't do > them)[1] ... you might want to further divide your data into > training/tuning/test (somewhere between steps 1 and 2) as another means of > scoring models. > > HTH, > -steve > > [1] http://hunch.net/?p=29 > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.