> If you actually want to find the best subsets, you can get a good > approximation by using leaps on the weighted least squares fit that > is the last iteration of the IWLS algorithm for fitting the glm. > > Running regsubsets witha reasonably large value of nbest and then > refitting the top models as glms afterwards will fairly realiably > give the best glms.
Thanks, that sounds interesting. I am as yet clueless to the workings of IWLS, so maybe this is nonsense: The result of running glm on the full model (all variables) is a crass example for overfitting, i.e. zero residuals, all R_i^2 close to 1, large coefficients. Would then the "weighed last squares fit of the last iteration of IWLS" not be pretty meaningless ? > Whether this is better than lasso depends on what you are trying to > do - IMO the only point of all-subsets regression is to get many best > models rather than a single one, and lasso doesn't do at all well at > that. Yes, I am trying to get a number of best models, since the final model selection shall be based on interpretability and expert knowledge. By now I have bootstrapped the lasso (using glmpath) to generate such a set, but the resulting models are very similar and I suspect there are is a larger variety of "best models". Harald ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.