Smita Pakhale wrote:
Using any 'significance level', I think is the main
problem in the stepwise variable selection method. As
such in 'normal' circumstances the interpretation of
p-value is topsy-turvy. Then you can only imagine as
to what happens to this p-value interpretation in this
process of variable selection...you no longer no, what
does the significance level mean, if at all anything?
smita
True, and AIC/BIC are just translations of P-values.
Frank
--- Frank E Harrell Jr <[EMAIL PROTECTED]>
wrote:
Xiaohui Chen wrote:
step or stepAIC functions do the job. You can opt
to use BIC by changing
the mulplication of penalty.
I think AIC and BIC are not only limited to
compare two pre-defined
models, they can be used as model search criteria.
You could enumerate
the information criteria for all possible models
if the size of full
model is relatively small. But this is not
generally scaled to practical
high-dimensional applications. Hence, it is often
only possible to find
a 'best' model of a local optimum, e.g. measured
by AIC/BIC.
Sure you can use them that way, and they may perform
better than other
measures, but the resulting model will be highly
biased (regression
coefficients biased away from zero). AIC and BIC
were not designed to
be used in this fashion originally. Optimizing AIC
or BIC will not
produce well-calibrated models as does penalizing a
large model.
On the other way around, I wouldn't like to say
the over-penalization of
BIC. Instead, I think AIC is usually
underpenalizing larger models in
terms of the positive probability of incoperating
irrevalent variables
in linear models.
If you put some constraints on the process (e.g., if
using AIC to find
the optimum penalty in penalized maximum likelihood
estimation), AIC
works very well and BIC results if far too much
shrinkage
(underfitting). If using a dangerous process such
as stepwise variable
selection, the more conservative BIC may be better
in some sense, worse
in others. The main problem with stepwise variable
selection is the use
of significance levels for entry below 1.0 and
especially below 0.1.
Frank
X
Frank E Harrell Jr 写�:
Smita Pakhale wrote:
Hi Maria,
But why do you want to use forwards or backwards
methods? These all are 'backward' methods of
modeling.
Try using AIC or BIC. BIC is much better than
AIC.
And, you do not have to believe me or any one
else on
this.
How does that help? BIC gives too much
penalization in certain
contexts; both AIC and BIC were designed to
compare two pre-specified
models. They were not designed to fix problems of
stepwise variable
selection.
Frank
Just make a small data set with a few variables
with
known relationship amongst them. With this
simulated
data set, use all your modeling methods:
backwards,
forwards, AIC, BIC etc and then see which one
gives
you a answer closest to the truth. The beauty of
using
a simulated dataset is that, you 'know' the
truth, as
you are the 'creater' of it!
smita
--- Charilaos Skiadas <[EMAIL PROTECTED]>
wrote:
A google search for "logistic regression with
stepwise forward in r" returns the following
post:
https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html
Haris Skiadas
Department of Mathematics and Computer Science
Hanover College
On May 28, 2008, at 7:01 AM, Maria wrote:
Hello,
I am just about to install R and was wondering
about a few things.
I have only worked in Matlab because I wanted
to
do a logistic
regression. However Matlab does not do
logistic
regression with
stepwiseforward method. Therefore I thought
about
testing R. So my
question is
can I do logistic regression with stepwise
forward
in R?
Thanks /M
______________________________________________
--
Frank E Harrell Jr Professor and Chair
School of Medicine
Department of Biostatistics
Vanderbilt University
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.