On 12/02/2009, at 2:02 PM, Lars Bishop wrote:

Hi,
I was wondering if I can have some advice on the following problem.

Let's say that I have a problem in which I want to predict a binary outcome and I use logistic regression for that purpose. In addition, suppose that my model includes predictors that will not be used in scoring new observations but must be used during model training to absorb certain effects that could
bias the parameter estimates of the other variables.

Because one needs to have the same predictors in model development and
scoring, how it is usually done in practice to overcome this problem? I could exclude the variables that will not be available during scoring, but
that will bias the estimates for the other variables.

Surely if you only have x_1, x_2, and x_3 available for prediction,
then you should ``train'' using only x_1, x_2, and x_3.

If you also have x_4 and x_5 available for training then not using them
will ``bias'' the coefficients of the other three predictors, but will
give the best (in some sense) values of these coefficients to use when
x_4 and x_5 are not available.

Note that not using x_4 and x_5 is equivalent to setting them equal to 0, so if you *insist* on fitting the model with x_1, ..., x_5 and then predicting with x_1, ..., x_3 you can accomplish this by setting x_4 and x_5 equal to 0
in your ``newdata'' data frame.

This seems to me to be highly inadvisable however.

        cheers,

                Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to