On 30-Jun-09 14:52:20, Marc Schwartz wrote: > On Jun 30, 2009, at 4:54 AM, Renzi Fabrizio wrote: > >> I would like to know how R computes the probability of an event >> in a logistic model (P(y=1)) from the score s, linear combination >> of x and beta. >> >> I noticed that there are differences (small, less than e-16) between >> the fitting values automatically computed in the glm procedure by R, >> and the values "manually" computed by me applying the reverse >> formula p=e^s/(1+e^s); moreover I noticed that the minimum value >> of the fitting values in my estimation is 2.220446e-16, and there >> are many observation with this probability (instead the minimum >> value obtained by "manually" estimation is 2.872636e-152). > > It would be helpful to see at least a subset of the output from your > model and your manual computations so that we can at least visually > compare the results to see where the differences may be. > > The model object returned from using glm() will contain both the > linear predictors on the link scale (model$linear.predictors) and > the fitted values (model$fitted.values). The latter can be accessed > using the fitted() extractor function. > > To use an example, let's create a simple LR model using the infert > data set as referenced in ?glm. > > model1 <- glm(case ~ spontaneous + induced, data = infert, > family = binomial()) > > > model1 > Call: glm(formula = case ~ spontaneous + induced, family = binomial(), data = infert) > > Coefficients: > (Intercept) spontaneous induced > -1.7079 1.1972 0.4181 > > Degrees of Freedom: 247 Total (i.e. Null); 245 Residual > Null Deviance: 316.2 > Residual Deviance: 279.6 AIC: 285.6 > ># Get the coefficients > > coef(model1) > (Intercept) spontaneous induced > -1.7078601 1.1972050 0.4181294 > ># get the linear predictor values ># log odds scale for binomial glm > > head(model1$linear.predictors) > 1 2 3 4 5 > 6 > 1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564 > 0.32560375 > > You can also get the above by using the coefficients and the model > matrix for comparison: ># the full set of 248 ># coef(model1) %*% t(model.matrix(model1)) > > head(as.vector(coef(model1) %*% t(model.matrix(model1)))) > [1] 1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564 > 0.32560375 > ># get fitted response values (predicted probs) > > head(fitted(model1)) > 1 2 3 4 5 6 > 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893 > > We can also get the fitted values from the linear predictor values > by using: > > LP <- model1$linear.predictors > > head(exp(LP) / (1 + exp(LP))) > 1 2 3 4 5 6 > 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893 > > You can also get the above by using the predict.glm() function with > type == "response". The default type of "link" will get you the linear > predictor values as above. predict.glm() would typically be used to > generate predictions using the model on new data. > > > head(predict(model1, type = "response")) > 1 2 3 4 5 6 > 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893 > > In glm.fit(), which is the workhorse function in glm(), the fitted > values returned in the model object are actually computed by using > the inverse link function for the family passed to glm(): > > > binomial()$linkinv > function (eta) > .Call("logit_linkinv", eta, PACKAGE = "stats") > <environment: 0x171c8b10> > > Thus: > > head(binomial()$linkinv(model1$linear.predictors)) > 1 2 3 4 5 6 > 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893 > > So those are the same values as we saw above using the other methods. > So, all is consistent across the various methods. > > Perhaps the above provides some insights for you into how R does some > of these things and also to point out, as is frequently the case with > R, there is more than one way to get the same result. > > HTH, > Marc Schwartz
An interesting and informative reply, Marc; but I think it misses the point of Fabrizio's query. I think Fabrizio's point is the following: set.seed(54321) X <- sort(rnorm(100)) a0 <- 1.0 ; b0 <- 0.5 Y <- 1*(runif(100) < a0 + b0*X) GLM <- glm(Y~X,family=binomial) C <- GLM$coef C # (Intercept) X # 2.726966 2.798429 a1 <- C[1] ; b1 <- C[2] Fit0 <- GLM$fit S <- a1 + b1*X Fit1 <- exp(S)/(1+exp(S)) max(abs(Fit1 - Fit0)) # [1] 1.110223e-16 This discrepancy is of course, in magnitude, a typical "rounding error". But the fact that it occurred indicates that when glm() computed the fitted values it did not do so by using the fitted coefficients GLM$coef, then creating the fitted score (linear predictor) S (as above), then applyhing to this the "inverse link" exp(S)/(1+exp(S)), since doing that would replicate the above calculation and should yield identical results. In fact, a bit more probing to GLM shows that there is already a discrepancy at the "score" level: S0 <- GLM$linear.predictors max(abs(S0-S)) # [1] 8.881784e-16 so S0 has not been calculated by applying GLM$coef to X. Also, if we apply the inverse link to S0, then: max(abs(Fit0 - exp(S0)/(1+exp(S0)))) # [1] 1.110223e-16 which is the same discrepancy as between Fit1 and Fit0. max(abs(Fit1 - exp(S0)/(1+exp(S0)))) # [1] 1.110223e-16 the same again!! Hence, if I have understood him aright, Fabrizio's question. Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 30-Jun-09 Time: 16:44:56 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.