Re: [R] General query regarding scoring new observations

Rolf Turner Wed, 11 Feb 2009 17:34:06 -0800


On 12/02/2009, at 2:02 PM, Lars Bishop wrote:

Hi,
I was wondering if I can have some advice on the following problem.
Let's say that I have a problem in which I want to predict a binaryoutcomeand I use logistic regression for that purpose. In addition,suppose that mymodel includes predictors that will not be used in scoring newobservationsbut must be used during model training to absorb certain effectsthat could
bias the parameter estimates of the other variables.

Because one needs to have the same predictors in model development and
scoring, how it is usually done in practice to overcome thisproblem? Icould exclude the variables that will not be available duringscoring, but
that will bias the estimates for the other variables.


Surely if you only have x_1, x_2, and x_3 available for prediction,
then you should ``train'' using only x_1, x_2, and x_3.

If you also have x_4 and x_5 available for training then not using them
will ``bias'' the coefficients of the other three predictors, but will
give the best (in some sense) values of these coefficients to use when
x_4 and x_5 are not available.

Note that not using x_4 and x_5 is equivalent to setting them equalto 0,so if you *insist* on fitting the model with x_1, ..., x_5 and thenpredictingwith x_1, ..., x_3 you can accomplish this by setting x_4 and x_5equal to 0

in your ``newdata'' data frame.

This seems to me to be highly inadvisable however.

        cheers,

                Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] General query regarding scoring new observations

Reply via email to