On Aug 30, 2010, at 11:44 AM, Jef Vlegels wrote:
There was a problem with the data in attachment, here is a better
version.
Use something like:
data <- read.table("pas_r.txt",header=TRUE,sep=";")
to read it.
I did, but I hate dealing with attached datasets and also am
uncomfortable dealing with datasets named data, so I messed you code
up a bit to avoid attach()-ment. I also find it cleaner to use subsets
of dataframes rather htna coding new variables. Eventually it came
down to removing the NA for the participant column. I didn't see the
need for the order() operation and thought it might create problems so
I just commented it out. (You also had of misspellings of your model
names.)
test <- gam(participant ~ s(age,fx=FALSE,bs='cr'),
family=binomial(logit), data=data.age)
summary(test)
plot(test, shade=TRUE)
gam.check(test)
test.pred <- predict(test,newdata=data.age,se.fit=TRUE,type='response',
na.action=na.omit)
I1<-order(data.age$age)
with(data.age, plot(age[I1], test.pred$fit[I1],lty=1, type="l"))
with(data.age, lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2))
with(data.age, lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2))
with(data.age, plot(age[I1], test.pred$fit[I1] , type="l"))
#In a second step, I want to calculate a similar model, but only for
#respondents with a certain characteristic. For example, in this case,
only
#for male respondents.
#I use a code that looks like this:
#participant_male <- participant[gender=="male"]
# age_male <- age[gender=="male"]
test.males<-gam(participant ~ s(age, fx=FALSE, bs="cr"),
data=subset(data.age, gender=="male"&!is.na(participant)),
family=binomial(logit), na.action=na.omit)
summary(test.males)
plot(test.males, shade=TRUE)
#I get a nice smoother function in this plot, like I expected.
#Then, when I want to plot the predicted values, I use a code that
looks like
#this:
Test2.pred <- predict(test.males,newdata=subset(data.age,
gender=="male"&!is.na(participant)), se.fit=TRUE, type="response")
#I1<-with(subset(data.age, gender=="male"), order(age))
with(subset(data.age, gender=="male"&!is.na(participant)), plot(age,
Test2.pred$fit, lty=1) )
I have not figured out why the NA's caused so much trouble but was
eventually driven to exclude them in the subset step by mismatched
lenght errors from plot(),
Thanks,
Jef
-----Original Message-----
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: maandag 30 augustus 2010 17:22
To: Jef Vlegels
Subject: FYI offlist Re: [R] 'mgcv' package, problem with predicting
binomial (logit) data
On Aug 30, 2010, at 11:17 AM, Jef Vlegels wrote:
Dear R-help list,
I'm using the mgcv package to plot predictions based on the gam
function.
I predict the chance of being a (frequent) participant at theater
plays vs.
not being a participant by age.
Because my outcome variable is dichotomous, I use the binomial
family with
logit link function.
Dataset in attachment, code to read it in R:
Nope. .sav files are not accepted by the mail-server. See the Posting
Guide. (.txt and .pdf are the safest formats)
data <- read.spss("pas_r.sav")
attach(data)
In a first step I use 'gam' to model my data and 'predict' to
calculate and
plot the predicted values, this all works fine.
My code looks like this:
test <- gam(participant ~ s(age,fx=FALSE,bs='cr'),
family=binomial(logit))
summary(test)
plot(test, shade=TRUE)
gam.check(test)
test.pred <- predict(test,newdata=data,se.fit=TRUE,type='response',
na.action=na.omit)
I1<-order(age)
plot(age[I1], test.pred$fit[I1],lty=1, type="l")
lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2)
lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2)
plot(age[I1], test.pred$fit[I1] , type="l")
I a second step, I want to calculate a similar model, but only for
respondents with a certain characteristic. For example, in this
case, only
for male respondents.
I use a code that looks like this:
participant_male <- participant[gender=="male"]
age_male <- age[gender=="male"]
test2<-gam(participant_male ~ s(age_male, fx=FALSE, bs="cr"),
family=binomial(logit), na.action=na.omit)
summary(test2)
plot(test2, shade=TRUE)
I get a nice smoother function in this plot, like I expected.
Then, when I want to plot the predicted values, I use a code that
looks like
this:
Test2.pred <- predict(test5,se.fit=TRUE, type="response")
I1<-order(age_male)
plot(age_male[I1], test2.pred$fit[I1],lty=1)
This last plot, of the predictions, is not what I expect. It's just
a random
scatterplot, not what I would expect from the smoother plot. Does
anybody
know what I did wrong?
Thanks in advance,
Jef Vlegels
Jef Vlegels
Ghent University - Department of Sociology
Korte Meer 3, B-9000 Gent, Belgium
Tel: 09 264 8343
www.psw.UGent.be/sociologie
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
<pas_r.txt>______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.