Re: [R] 'mgcv' package, problem with predicting binomial (logit) data

David Winsemius Mon, 30 Aug 2010 12:46:14 -0700


On Aug 30, 2010, at 11:44 AM, Jef Vlegels wrote:

There was a problem with the data in attachment, here is a betterversion.
Use something like:
data <- read.table("pas_r.txt",header=TRUE,sep=";")
to read it.

I did, but I hate dealing with attached datasets and also amuncomfortable dealing with datasets named data, so I messed you codeup a bit to avoid attach()-ment. I also find it cleaner to use subsetsof dataframes rather htna coding new variables. Eventually it camedown to removing the NA for the participant column. I didn't see theneed for the order() operation and thought it might create problems soI just commented it out. (You also had of misspellings of your modelnames.)

test <- gam(participant ~ s(age,fx=FALSE,bs='cr'),family=binomial(logit), data=data.age)

summary(test)
plot(test, shade=TRUE)
gam.check(test)
test.pred <- predict(test,newdata=data.age,se.fit=TRUE,type='response',
na.action=na.omit)
I1<-order(data.age$age)
with(data.age, plot(age[I1], test.pred$fit[I1],lty=1, type="l"))
with(data.age, lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2))
with(data.age, lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2))
with(data.age, plot(age[I1], test.pred$fit[I1] , type="l"))

#In a second step, I want to calculate a similar model, but only for

#respondents with a certain characteristic. For example, in this case,only

#for male respondents.
#I use a code that looks like this:

#participant_male <- participant[gender=="male"]
# age_male <- age[gender=="male"]

test.males<-gam(participant ~ s(age, fx=FALSE, bs="cr"),data=subset(data.age, gender=="male"&!is.na(participant)),

family=binomial(logit), na.action=na.omit)
summary(test.males)
plot(test.males, shade=TRUE)

#I get a nice smoother function in this plot, like I expected.

#Then, when I want to plot the predicted values, I use a code thatlooks like

#this:

Test2.pred <- predict(test.males,newdata=subset(data.age,gender=="male"&!is.na(participant)), se.fit=TRUE, type="response")

#I1<-with(subset(data.age, gender=="male"), order(age))

with(subset(data.age, gender=="male"&!is.na(participant)), plot(age,Test2.pred$fit, lty=1) )

I have not figured out why the NA's caused so much trouble but waseventually driven to exclude them in the subset step by mismatchedlenght errors from plot(),


Thanks,
Jef

-----Original Message-----
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: maandag 30 augustus 2010 17:22
To: Jef Vlegels
Subject: FYI offlist Re: [R] 'mgcv' package, problem with predicting
binomial (logit) data


On Aug 30, 2010, at 11:17 AM, Jef Vlegels wrote:

Dear R-help list,

I'm using the mgcv package to plot predictions based on the gam
function.

I predict the chance of being a (frequent) participant at theater
plays vs.
not being a participant by age.
Because my outcome variable is dichotomous, I use the binomial
family with
logit link function.

Dataset in attachment, code to read it in R:


Nope. .sav files are not accepted by the mail-server. See the Posting
Guide. (.txt and .pdf are the safest formats)

data <- read.spss("pas_r.sav")
attach(data)

In a first step I use 'gam' to model my data and 'predict' to
calculate and
plot the predicted values, this all works fine.
My code looks like this:

test <- gam(participant ~ s(age,fx=FALSE,bs='cr'),
family=binomial(logit))
summary(test)
plot(test, shade=TRUE)
gam.check(test)
test.pred <- predict(test,newdata=data,se.fit=TRUE,type='response',
na.action=na.omit)
I1<-order(age)
plot(age[I1], test.pred$fit[I1],lty=1, type="l")
lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2)
lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2)
plot(age[I1], test.pred$fit[I1] , type="l")

I a second step, I want to calculate a similar model, but only for
respondents with a certain characteristic. For example, in this
case, only
for male respondents.
I use a code that looks like this:

participant_male <- participant[gender=="male"]
age_male <- age[gender=="male"]

test2<-gam(participant_male ~ s(age_male, fx=FALSE, bs="cr"),
family=binomial(logit), na.action=na.omit)
summary(test2)
plot(test2, shade=TRUE)

I get a nice smoother function in this plot, like I expected.
Then, when I want to plot the predicted values, I use a code that
looks like
this:

Test2.pred <- predict(test5,se.fit=TRUE, type="response")
I1<-order(age_male)
plot(age_male[I1], test2.pred$fit[I1],lty=1)

This last plot, of the predictions, is not what I expect. It's just
a random
scatterplot, not what I would expect from the smoother plot. Does
anybody
know what I did wrong?

Thanks in advance,
Jef Vlegels

Jef Vlegels
Ghent University - Department of Sociology
Korte Meer 3, B-9000 Gent, Belgium
Tel:  09 264 8343
www.psw.UGent.be/sociologie


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

<pas_r.txt>______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'mgcv' package, problem with predicting binomial (logit) data

Reply via email to