you could also give a try to the following piece of code: form$finished <- factor(form$finished) glmFit <- glm(finished ~ ., family = binomial, data = form[1:150000, ]) preds <- predict(glmFit, newdata = form[150001:200000, ], type = "response")
Note also the following: * since you supply the `data' argument of glm() you do not need to specify the `formula' argument as "data$y ~ data$x", just use "y ~ x", etc. * for predict.glm() the argument is `newdata' not `data', and also that `type = "response"' gives you the predicted probabilities; look at ?predict.glm() for more info. I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, October 02, 2007 5:34 AM Subject: [R] problems with glm >I am having a couple of problems someone may be able to cast some >light on. > > > Question 1: > > I am making a logistic model but when i do this: > > glm.model = glm(as.factor(form$finished) ~ ., family=binomial, > data=form[1:150000,]) > > I get this: > > > Error in model.frame(formula, rownames, variables, varnames, extras, > extranames, : > variable lengths differ (found for 'barrier') > > > which is very strange because when I name the predictive factors > like this: > > glm.model = glm(as.factor(form$finished) ~ form$first + form$second > + > form$third + form$barrier, family=binomial, data=form[1:150000,]) > > It produces a model: > > Call: > glm(formula = as.factor(form$finished) ~ form$first + form$second + > form$third + form$barrier, family = binomial, data = > form[1:150000, > ]) > > Deviance Residuals: > Min 1Q Median 3Q Max > -3.0884 -0.4932 -0.3951 -0.3006 2.7135 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -2.957831 0.021446 -137.920 < 2e-16 *** > form$first 0.624463 0.078036 8.002 1.22e-15 *** > form$second 0.754057 0.080787 9.334 < 2e-16 *** > form$third 7.718261 0.078532 98.281 < 2e-16 *** > form$barrier -0.058185 0.002175 -26.751 < 2e-16 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 144850 on 215213 degrees of freedom > Residual deviance: 133292 on 215209 degrees of freedom > AIC: 133302 > > Number of Fisher Scoring iterations: 5 > > Any idea why the first glm call doesn;t work? > > Second Question: > > Now I want to predict so i do this: > > pred <- predict(glm.model,data=form[150001:20000,],type="response") > > but when I try to use it I get this: > >> pred <- >> predict(glm.model,data=form[150001:200000,],type="response") >> t = table(pred,form$finished[150001:200000]) > Error in table(pred, form$finished[150001:2e+05]) : > all arguments must have the same length > > and when I do this it confirms my pred is not 50000 long! > >> length(pred) > [1] 215214 > > It doesn't look as though my slection of rows has worked at all. > Anyone > know why? > > Stephen > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.