Hi, On Tue, May 24, 2011 at 12:50 PM, Vishal Thapar <vishaltha...@gmail.com> wrote: > Hi, > > I am writing to seek some guidance regarding using Lasso regression with the > R package LARS. I have introductory statistics background but I am trying to > learn more. Right now I am trying to duplicate the results in a paper for > shRNA prediction "An accurate and interpretable model for siRNA efficacy > prediction, Jean-Philippe Vert et. al, Bioinformatics" for a Bioinformatics > project that we are working on. I know that the authors of the paper are > using Lasso regression and so far looking at their paper this is what I have > gotten to.
I'm not going to comment on your code -- I'll just give you a birds eye view. First off, use glmnet. You get the lasso by setting alpha to 1. You might get better results, though, if you use it as an "elastic net" and set alpha to something like .95 -- you'll have to play with it. This gives a mix of the L1 + L2 penalty. What you get is a sparse model (from the L1 penalty), but it now also has the tendency to give similar coefficients to correlated features instead of just dropping one of them and putting full weight on the other. That's a good thing. Now that you are using the glmnet package, use `cv.glmnet` Say you did: R> cvg <- cv.glmnet(... whatever ..., alpha=0.95) The idea is that you want use the value of lambda that performs best under the cross validation scenario. Look at cvg$cvm Now -- people like "sparser models," which you get by cranking up the lambda value, but you don't want lambda to be so high that it makes your model perform worse in the CV scenario. cvg$lambda.min has the value of lambda that gives the minimum cv error. But, maybe you want the model sparser than that ... maybe it's OK if it does 1 standard error away from the best performance seen across your cv, that's why you have cvg$lambda.1se So: cvg$glmnet.fit will have the model fit on all the data. You have to extract the coefficients from that model given a value of lambda you deemed appropriate, that's why you do cross validation. This might be the value of lambda in cvg$lambda.min or cvg$lambda.1se -- you make the call. Makes sense? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.