[R] Appropriate tests for logistic regression with a continuous predictor variable and Bernoulli response variable

Kiyoshi Sasaki Fri, 09 Jul 2010 00:33:47 -0700

I have a data with binary response variable, repcnd (pregnant or not) and one 
predictor continuous variable, svl (body size) as shown below. I did 
Hosmer-Lemeshow test as a goodness of fit (as suggested by a kind 
âR-helperâ previously). To test whether the predictor (svl, or body size) 
has significant effect on predicting whether or not a female snake is pregnant, 
I used the differences between null deviance and residual deviance using a code 
as following:


Â 
1-pchisq(mod.fit$null.deviance - mod.fit$deviance, mod.fit$df.null - 
mod.fit$df.residual)
Â 
Could anyone tell me whether I did the test properly? IÂ didÂ this test because 
I thought Wald test/z score listed in the output from "summary(mod.fit)"Â is 
not appropriate forÂ a kind of data I have.Â Â Does R have automated function 
to run appropriate tests?Â I have pasted my R output below.
Â 
Thank you in advance for your time and help.
Â 
Kiyoshi
Â 
Â 
Â Â Â Â Â Â Â Â Â Â Â  repcnd Â Â Â Â Â Â Â Â Â Â Â  svl
1Â Â Â Â Â Â Â  Â  1 Â Â Â Â Â Â Â Â  Â Â Â Â Â Â Â Â Â Â Â  51.5
2Â Â Â Â Â Â Â  Â  1 Â Â Â Â Â Â Â Â  Â Â Â Â Â Â Â Â Â Â Â  52.5
<edited>
294Â Â Â Â Â  0 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  59.8
298Â Â Â Â Â  1 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  60.0
300Â Â Â Â Â  1 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  51.7
301Â Â Â Â Â  1 Â Â Â Â Â Â Â Â  Â Â Â Â Â Â Â Â Â Â Â  57.4
302Â Â Â Â Â  1 Â Â Â Â Â Â Â Â  Â Â Â Â Â Â Â Â Â Â Â  60.9
303Â Â Â Â Â  0 Â Â Â Â Â Â Â Â  Â Â Â Â Â Â Â Â Â Â Â  56.8
304Â Â Â Â Â  0 Â Â Â Â Â Â Â Â  Â Â Â Â Â Â Â Â Â Â Â  50.0
-------------------
> mod.fit <- glm(formula = gb.no.M$repcnd ~ gb.no.M$svl, family = binomial(link 
> = logit), data = gb.no.M, na.action = na.exclude, control = list(epsilon = 
> 0.0001, maxit = 50, trace = F))
> summary(mod.fit)
Â 
Call:
glm(formula = gb.no.M$repcnd ~ gb.no.M$svl, family = binomial(link = logit), 
Â Â Â  data = gb.no.M, na.action = na.exclude, control = list(epsilon = 1e-04, 
Â Â Â Â Â Â Â  maxit = 50, trace = F))
Â 
Deviance Residuals: 
Â Â  MinÂ Â Â Â Â  1QÂ  MedianÂ Â Â Â Â  3QÂ Â Â Â  MaxÂ  
-1.757Â  -1.109Â Â  0.734Â Â  1.113Â Â  1.632Â  
Â 
Coefficients:
Â Â Â Â Â Â Â Â Â Â Â  Estimate Std. Error z value Pr(>|z|)Â Â Â  
(Intercept) -7.08565Â Â Â  1.84106Â  -3.849 0.000119 ***
gb.no.M$svlÂ  0.13529Â Â Â  0.03474Â Â  3.894 9.85e-05 ***
---
Signif. codes:Â  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â 
â 1 
Â 
(Dispersion parameter for binomial family taken to be 1)
Â 
Â Â Â  Null deviance: 301.92Â  on 217Â  degrees of freedom
Residual deviance: 285.04Â  on 216Â  degrees of freedom
Â  (8 observations deleted due to missingness)
AIC: 289.04
Â 
Number of Fisher Scoring iterations: 3
-------------------------------------------------------------------------------
> Hosmer-Lemeshow test
> 
> hosmerlem <- function (y, yhat, g = 10) 
+ {
+ cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)), 
include.lowest = T)
+ obs <- xtabs(cbind(1 - y, y) ~ cutyhat)
+ expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
+Â  chisq <- sum((obs - expect)^2/expect)
+ P <- 1 - pchisq(chisq, g - 2)
+ c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P)
+ }
> 
> mod.fit <- glm(formula = no.NA$repcnd ~Â  no.NA$svl, family = binomial(link = 
> logit), data =Â  no.NA, na.action = na.exclude, control = list(epsilon = 
> 0.0001, maxit = 50, trace = F))
Â 
> hosmerlem(no.NA$repcnd, fitted(mod.fit))
Â Â Â Â Â  X^2Â Â Â Â Â Â Â  DfÂ Â  P(>Chi) 
6.8742531 8.0000000 0.5502587
---------------------------------------------------------------------------------------------------
> list(p.value = round(1-pchisq(mod.fit$null.deviance - mod.fit$deviance,
+ mod.fit$df.null- mod.fit$df.residual),6), 
+ df = mod.fit$df.null- mod.fit$df.residual,
+ change = mod.fit$null.deviance - mod.fit$deviance)
Â 
$p.value
[1] 4e-05
Â 
$df
[1] 1
Â 
$change
[1] 16.87895


      
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Appropriate tests for logistic regression with a continuous predictor variable and Bernoulli response variable

Reply via email to