Andrew Robinson 写道:
On Wed, May 28, 2008 at 03:47:49PM -0700, Xiaohui Chen wrote:
Frank E Harrell Jr ??????:
Xiaohui Chen wrote:
step or stepAIC functions do the job. You can opt to use BIC by
changing the mulplication of penalty.
I think AIC and BIC are not only limited to compare two pre-defined
models, they can be used as model search criteria. You could
enumerate the information criteria for all possible models if the
size of full model is relatively small. But this is not generally
scaled to practical high-dimensional applications. Hence, it is often
only possible to find a 'best' model of a local optimum, e.g.
measured by AIC/BIC.
Sure you can use them that way, and they may perform better than other
measures, but the resulting model will be highly biased (regression
coefficients biased away from zero). AIC and BIC were not designed to
be used in this fashion originally. Optimizing AIC or BIC will not
produce well-calibrated models as does penalizing a large model.
Sure, I agree with this point. AIC is used to correct the bias from the
estimations which minimize the KL distance of true model, provided the
assumed model family contains the true model. BIC is designed for
approximating the model marginal likelihood. Those are all
post-selection estimating methods. For simutaneous variable selection
and estimation, there are better penalizations like L1 penalty, which is
much better than AIC/BIC in terms of consistency.
Xiaohui,
Tibshirani (1996) suggests that the quality of the L1 penalty depends
on the structure of the dataset. As I recall, subset selection was
preferred for finding a small number of large effects, lasso (L1) for
finding a small to moderate number of moderate-sized effects, and
ridge (L2) for many small effects.
I agree with you. Higher correlation between covariates makes the LASSO
harder to choose the correct model asymptotically, see Zhao and Yu
(2006). Subset selection based on prediction error tends to inflate the
estimated variance of coefficients in linear models. L2 doesn't do the
variable selection job as well known. But (convex) mixing L1 and L2
penalty is the elastic net proposed by Zou and Hastie (2006), which
encourages the grouped effect. More recently, there are many other
priors/penalties proposed if you go through the literature.
Zhao P. & Yu B. (2006) On Model Selection Consistency of Lasso. JMLR
Zou H. and Hastie T. (2006) Regularization and variable selection via
the elastic net. JRSSB
Can you provide any references to more up-to-date simulations that you
would recommend?
Cheers,
Andrew
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.