Hi! When I apply the lars (least-angle-regression) method to my data (3655 features, only 355 data points, no I did not mistype), I observe a strange behaviour:
1) The beta values tend to grow into real high values quite fast up to a point where they overflow and get negative. The overflow is not a problem, I don't need the last part of the analysis anyway, but why do they just shoot up to high values like that...? Any explanation? 2) The Cp values... they start at about -360 and grow linearly with increasing steps. This is totally strange since they ought to be an "overly optimistic estimation of the generalization error" according to Hastie's book. 3) Lastly, I get a curve for the r^2 correlation values, that grows up to a plateau where they are 1 (until they reach the point where betas overflow, then it gets negative, but forget about that). This is classic overfitting happening. The calculation IS right though, since using the components and betas from one of those r^2=1 steps gives a correlation of like 0.96 with nu-SVR too. The generalization is pretty bad though. The funny thing: I observe qualitatively the same when starting with 359 of these features and do a lars on them. So, questions I have: * Regarding to point 1 and 2, does anybody have an explanation for the described behaviour? I can't seem to find one myself... * Did anybody try lars on data with such a bad feature to data points ratio before? What were the experiences? * Why does it overfit so bad? I have also tried the crossvalidation selection (cv.lars) but it does not give me the selected features or betas, just the r^2 and RSS values from its runs... Thanks for any thoughts on this! Ciao! Wiebke ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.