Hi, I am writing to seek some guidance regarding using Lasso regression with the R package LARS. I have introductory statistics background but I am trying to learn more. Right now I am trying to duplicate the results in a paper for shRNA prediction "An accurate and interpretable model for siRNA efficacy prediction, Jean-Philippe Vert et. al, Bioinformatics" for a Bioinformatics project that we are working on. I know that the authors of the paper are using Lasso regression and so far looking at their paper this is what I have gotten to.
xtrain <- trainData > dim(trainData) [1] 18520 88 ytrain <- trainScore length(ytrain) [1] 18520 nfolds <- 100 epsilon <- exp(-10) # code from JP Vert object1 <- cv.lars(xtrain,ytrain, K=nfolds, fraction = seq(from = 0, to = 1 , length= 1000), type='lasso', eps=epsilon, plot.it=TRUE) bestfraction <- object1$fraction[min(which(object1$cv <= min(object1$cv)+ 0.01*(max(object1$cv)-min(object1$cv))))] bestcoef <- coef(object1,s=bestfraction,mode='fraction') # this gives bestcoef as NULL, I don't know why. # End code from Jp Vert # Code by me:: Not so sure about the authenticity of this-- predictor <- lars(trainData,trainScore, type='lasso', eps=epsilon,trace=TRUE ) reslt<-predict.lars(predictor,trainData,s=bestfraction,mode="lambda",type= "coefficients") My aim is to extract the coefficients of the m variables from the regression for the best case or best fraction I am a bit confused with the above two lines of code that I have figured out from the LARS manual. Here is where I have trouble following: 1) cv.lars() returns a list. How can I get from that to a lars object which I can then use in the predict.lars() function with the "bestfraction"? 2) In the above code, I have used the "s=bestfraction" option in the predict.lars() function, but when I called lars() function, it only used 69 iterations where as in cv.lars() there are 1000 fractions. How do I tell lars() to do 1000 fractions like in cv.lars() and pick the best one. Again I could be completely wrong here because of my lack of understanding of LARS. Sorry :( I apologize in advance for my ignorance about the statistics in this. I am reading up on the paper on LARS but it will take me sometime to figure this out based on that and so I am seeking out your help. Thank you so much in anticipation of your reply. Sincerely, Vishal ### Summary of objects generated in R ### > summary(reslt) Length Class Mode s 1 -none- numeric fraction 1 -none- numeric mode 1 -none- character coefficients 88 -none- numeric > > summary(object1) Length Class Mode fraction 1000 -none- numeric cv 1000 -none- numeric cv.error 1000 -none- numeric > > bestfraction [1] 0.7687688 > summary(predictor) LARS/LASSO Call: lars(x = trainData, y = trainScore, type = "lasso", trace = TRUE, Call: eps = epsilon) Df Rss Cp 0 1 2139108 849.989 1 2 2113344 618.713 2 3 2108060 572.873 3 4 2107447 569.322 4 5 2106816 565.603 5 6 2099451 500.922 6 7 2098693 496.061 7 8 2098506 496.366 8 9 2096066 476.274 9 10 2095918 476.929 10 11 2092055 443.957 11 12 2089956 426.950 12 13 2085501 388.615 13 14 2084952 385.642 14 15 2084392 382.573 15 16 2081925 362.239 16 17 2080698 353.128 17 18 2080401 352.438 18 19 2079909 349.988 19 20 2078895 342.801 20 21 2077178 329.261 21 22 2076617 326.181 22 23 2076388 326.108 23 24 2072939 296.872 24 25 2072099 291.270 25 26 2071479 287.659 26 27 2070436 280.211 27 28 2069626 274.876 28 29 2069571 276.384 29 30 2068567 269.293 30 31 2068424 269.994 31 32 2063186 224.574 32 33 2063144 226.192 33 34 2062767 224.774 34 35 2061369 214.117 35 36 2060113 204.742 36 37 2059941 205.190 37 38 2058845 197.266 38 39 2056762 180.409 39 40 2054715 163.869 40 39 2054413 159.141 41 40 2052346 142.426 42 41 2052231 143.384 43 42 2051759 141.107 44 43 2051520 140.945 45 44 2051438 142.197 46 45 2051072 140.890 47 46 2049811 131.467 48 47 2049294 128.787 49 48 2048110 120.069 50 49 2045617 99.494 51 50 2044817 94.255 52 51 2044323 91.780 53 52 2044285 93.434 54 53 2043975 92.625 55 54 2043726 92.373 56 55 2042600 84.175 57 56 2042513 85.395 58 57 2040895 72.745 59 58 2040614 72.196 60 59 2040234 70.755 61 60 2039718 68.081 62 61 2039042 63.966 63 62 2039037 65.919 64 63 2038989 67.481 65 64 2038891 68.594 66 65 2038460 66.692 67 66 2038198 66.325 68 67 2038052 67.000 * * [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.