Hi Anina, predict.glm returns predicted probabilities, when used with type="response", so you have to either scale the probs to the number of trials for any x or you plot probs from start:
par(mfcol=c(1,2)) plot(x, successes) lines(x, (successes+failures)*predict(glm1, type= "response"), lwd=2) plot(x, successes/(successes+failures)) lines(x, predict(glm1, type= "response"), lwd=2) If you use predict with new data, please name the rows accordingly, when you actually want to use them: newdf <- data.frame(seq(min(x), max(x), length=20)) #you would expect 20 predictions, but *surprise* you get length(predict(glm2, type="response", newdata= newdf)) #52! newdf <- data.frame(x=seq(min(x), max(x), length=20)) length(predict(glm2, type="response", newdata= newdf)) #ok And you should use the appropriate x in "lines" as well: plot(x,successes/(successes+failures)) lines(newdf$x, predict(glm2, type="response", newdata= newdf),col="red") Cheers Am 21.09.2011 09:55, schrieb Heystek, A, Me <15418...@sun.ac.za>: > Problems with predict and lines in plotting binomial glm > Dear R-helpers > > I have found quite a lot of tips on how to work with glm through this mailing > list, but still have a problem that I can't solve. > I have got a data set of which the x-variable is count data and the > y-variable is proportional data, and I want to know what the relationship > between the variables are. > The data was overdispersed (the residual deviance is much larger than the > residual degrees of freedom) therefore I am using the quasibinomial family, > thus the y-variable is a matrix of successes and failures (20 trials for > every sample, thus each y-variable row counts up to 20). > > x <- c(1200, 1200, 1200, 1200, 1200, 1200, 1200, 1200, 1800, 1800, 1800, > 1800, 1800, 1800, 1800, 1800, 1800, 2400, 2400, 2400, 2400, 2400, 2400, 2400, > 3000, 3000, 3600, 3600, 3600, 3600, 4200, 4200, 4800, 4800, 5400, 6600, 6600, > 7200, 7800, 7800, 8400, 8400, 8400, 9000, 9600, 10200, 13200, 18000, 20400, > 24000, 25200, 36600) > successes <- c(6, 16, 11, 14, 11, 16, 13, 13, 14, 16, 12, 12, 11, 15, 12, 9, > 7, 7, 17, 15, 13, 9, 9, 12, 14, 8, 9, 16, 7, 9, 14, 11, 8, 8, 13, 6, 16, 11, > 9, 7, 9, 8, 4, 14, 7, 3, 3, 9, 12, 8, 4, 6) > failures <- c(14, 4, 9, 6, 9, 4, 7, 7, 6, 4, 8, 8, 9, 5, 8, 11, 13, 13, 3, 5, > 7, 11, 11, 8, 6, 12, 11, 4, 13, 11, 6, 9, 12, 12, 7, 14, 4, 9, 11, 13, 11, > 12, 16, 6, 13, 17, 17, 11, 8, 12, 16, 14) > y <- cbind(successes, failures) > data <- data.frame(y, x) > > glm1 <- glm(y ~ x, family= quasibinomial, data= data) > glm2 <- glm(y ~ log(x), family=quasibinomial, data= data) # residual > deviance is lower with log transformed x-value > plot(x, successes) > lines(x, predict(glm1, type= "response"), lwd=2) > > > > > Firstly, because of the skewed distribution of the x variable I am not sure > whether it should be log transformed or not. When I do log transform it, the > residual deviance and the p-value for the slope is lower. > Either way, the lines command does not plot any line and neither does it give > any error messages. On some of my other data it plots a line way below all > the data points. From what I can gather, the predict function as it is now > uses the fitted values because no newdata argument is specified. I want the > line to be predicted from the same x-values. I tried two ways of adding the > newdata argument: > > ## a data.frame using the original x-values > lines(x, predict(glm2, type= "response", newdata= as.data.frame(x))) > ## or a data.frame with values (the same length as y) from the range of x > values > newdf <- data.frame(seq(min(x), max(x), length=52)) > lines(x, predict(glm2, type="response", newdata= newdf)) > > Only the second option plotted a line once, but then I could never get it to > do the same again on a new plot even though I used the same variables and > same code. > > > Thank you very much for your time and patient > Anina Heystek > > BSc Honours student > Department of Botany and Zoology > University of Stellenbosch, Stellenbosch, South Africa > 15418...@sun.ac.za > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Jörg F. Debatin (Vorsitzender), Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.