On Jun 30, 2009, at 4:54 AM, Renzi Fabrizio wrote:

I would like to know how R computes the probability of an event in a
logistic model (P(y=1)) from the score s, linear combination of x and
beta.

I noticed that there are differences (small, less than e-16) between the fitting values automatically computed in the glm procedure by R, and the
values "manually" computed by me applying the reverse formula
p=e^s/(1+e^s); moreover I noticed that the minimum value of the fitting values in my estimation is 2.220446e-16, and there are many observation with this probability (instead the minimum value obtained by "manually"
estimation is 2.872636e-152).



It would be helpful to see at least a subset of the output from your model and your manual computations so that we can at least visually compare the results to see where the differences may be.

The model object returned from using glm() will contain both the linear predictors on the link scale (model$linear.predictors) and the fitted values (model$fitted.values). The latter can be accessed using the fitted() extractor function.

To use an example, let's create a simple LR model using the infert data set as referenced in ?glm.


model1 <- glm(case ~ spontaneous + induced, data = infert, family = binomial())

> model1

Call: glm(formula = case ~ spontaneous + induced, family = binomial(), data = infert)

Coefficients:
(Intercept)  spontaneous      induced
    -1.7079       1.1972       0.4181

Degrees of Freedom: 247 Total (i.e. Null);  245 Residual
Null Deviance:      316.2
Residual Deviance: 279.6        AIC: 285.6




# Get the coefficients
> coef(model1)
(Intercept) spontaneous     induced
 -1.7078601   1.1972050   0.4181294



# get the linear predictor values
# log odds scale for binomial glm
> head(model1$linear.predictors)
          1           2           3           4           5           6
 1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564  0.32560375



You can also get the above by using the coefficients and the model matrix for comparison:

# the full set of 248
# coef(model1) %*% t(model.matrix(model1))

> head(as.vector(coef(model1) %*% t(model.matrix(model1))))
[1] 1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564 0.32560375



# get fitted response values (predicted probs)
> head(fitted(model1))
        1         2         3         4         5         6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893



We can also get the fitted values from the linear predictor values by using:

LP <- model1$linear.predictors

> head(exp(LP) / (1 + exp(LP)))
        1         2         3         4         5         6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893



You can also get the above by using the predict.glm() function with type == "response". The default type of "link" will get you the linear predictor values as above. predict.glm() would typically be used to generate predictions using the model on new data.

> head(predict(model1, type = "response"))
        1         2         3         4         5         6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893



In glm.fit(), which is the workhorse function in glm(), the fitted values returned in the model object are actually computed by using the inverse link function for the family passed to glm():

> binomial()$linkinv
function (eta)
.Call("logit_linkinv", eta, PACKAGE = "stats")
<environment: 0x171c8b10>


Thus:

> head(binomial()$linkinv(model1$linear.predictors))
        1         2         3         4         5         6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893


So those are the same values as we saw above using the other methods. So, all is consistent across the various methods.

Perhaps the above provides some insights for you into how R does some of these things and also to point out, as is frequently the case with R, there is more than one way to get the same result.

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to