Andrew Robinson 写道:
On Wed, May 28, 2008 at 03:47:49PM -0700, Xiaohui Chen wrote:
Frank E Harrell Jr ??????:
Xiaohui Chen wrote:
step or stepAIC functions do the job. You can opt to use BIC by changing the mulplication of penalty.

I think AIC and BIC are not only limited to compare two pre-defined models, they can be used as model search criteria. You could enumerate the information criteria for all possible models if the size of full model is relatively small. But this is not generally scaled to practical high-dimensional applications. Hence, it is often only possible to find a 'best' model of a local optimum, e.g. measured by AIC/BIC.
Sure you can use them that way, and they may perform better than other measures, but the resulting model will be highly biased (regression coefficients biased away from zero). AIC and BIC were not designed to be used in this fashion originally. Optimizing AIC or BIC will not produce well-calibrated models as does penalizing a large model.

Sure, I agree with this point. AIC is used to correct the bias from the estimations which minimize the KL distance of true model, provided the assumed model family contains the true model. BIC is designed for approximating the model marginal likelihood. Those are all post-selection estimating methods. For simutaneous variable selection and estimation, there are better penalizations like L1 penalty, which is much better than AIC/BIC in terms of consistency.

Xiaohui,
Tibshirani (1996) suggests that the quality of the L1 penalty depends
on the structure of the dataset.  As I recall, subset selection was
preferred for finding a small number of large effects, lasso (L1) for
finding a small to moderate number of moderate-sized effects, and
ridge (L2) for many small effects.
I agree with you. Higher correlation between covariates makes the LASSO harder to choose the correct model asymptotically, see Zhao and Yu (2006). Subset selection based on prediction error tends to inflate the estimated variance of coefficients in linear models. L2 doesn't do the variable selection job as well known. But (convex) mixing L1 and L2 penalty is the elastic net proposed by Zou and Hastie (2006), which encourages the grouped effect. More recently, there are many other priors/penalties proposed if you go through the literature.

Zhao P. & Yu B. (2006) On Model Selection Consistency of Lasso. JMLR
Zou H. and Hastie T. (2006) Regularization and variable selection via the elastic net. JRSSB
Can you provide any references to more up-to-date simulations that you
would recommend?

Cheers,

Andrew


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to