Re: [R] can I do this with R?

Xiaohui Chen Wed, 28 May 2008 16:36:35 -0700

Andrew Robinson 写道:

On Wed, May 28, 2008 at 03:47:49PM -0700, Xiaohui Chen wrote:
Frank E Harrell Jr ??????:
Xiaohui Chen wrote:
step or stepAIC functions do the job. You can opt to use BIC bychanging the mulplication of penalty.
I think AIC and BIC are not only limited to compare two pre-definedmodels, they can be used as model search criteria. You couldenumerate the information criteria for all possible models if thesize of full model is relatively small. But this is not generallyscaled to practical high-dimensional applications. Hence, it is oftenonly possible to find a 'best' model of a local optimum, e.g.measured by AIC/BIC.
Sure you can use them that way, and they may perform better than othermeasures, but the resulting model will be highly biased (regressioncoefficients biased away from zero). AIC and BIC were not designed tobe used in this fashion originally. Optimizing AIC or BIC will notproduce well-calibrated models as does penalizing a large model.
Sure, I agree with this point. AIC is used to correct the bias from theestimations which minimize the KL distance of true model, provided theassumed model family contains the true model. BIC is designed forapproximating the model marginal likelihood. Those are allpost-selection estimating methods. For simutaneous variable selectionand estimation, there are better penalizations like L1 penalty, which ismuch better than AIC/BIC in terms of consistency.
Xiaohui,
Tibshirani (1996) suggests that the quality of the L1 penalty depends
on the structure of the dataset.  As I recall, subset selection was
preferred for finding a small number of large effects, lasso (L1) for
finding a small to moderate number of moderate-sized effects, and
ridge (L2) for many small effects.

I agree with you. Higher correlation between covariates makes the LASSOharder to choose the correct model asymptotically, see Zhao and Yu(2006). Subset selection based on prediction error tends to inflate theestimated variance of coefficients in linear models. L2 doesn't do thevariable selection job as well known. But (convex) mixing L1 and L2penalty is the elastic net proposed by Zou and Hastie (2006), whichencourages the grouped effect. More recently, there are many otherpriors/penalties proposed if you go through the literature.


Zhao P. & Yu B. (2006) On Model Selection Consistency of Lasso. JMLR

Zou H. and Hastie T. (2006) Regularization and variable selection viathe elastic net. JRSSB

Can you provide any references to more up-to-date simulations that you
would recommend?

Cheers,

Andrew


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can I do this with R?

Reply via email to