Re: [R] confidence interval for the logit - predict.glm

Christine Sonvilla Wed, 28 May 2008 18:35:14 -0700

Dear Brian and list members,

Thanks very much for your response Brian!
I applied the adjusted calculation that you advised me to use 
[1/(1+exp(-upperlogit))] and as a result I don't get any more NA values in my 
upper confidence interval values.


Yet, some outcomes are very akward, since for very small values such as 
1.98E-10, the lower values very reasonably turns out to be 0, yet the upper 
value is simply set to 1, which kind of makes the confidence interval redundant 
and doesn't give any proper idea how good of a predicted value this 1.98E-10 
value and other similar ones are.

In the following I try to retrace what excatly I am doing, yet I suppose the 
principle of "reproducible code" will be difficult to accomplish, since I can't 
add my input files here, I try my best and hope it kind of fits the "criteria":

My input txt.files are:
testfishes.txt  (column names are the fish species, row names are the planning 
units, where fish surveys where conducted, so the matrix is binary, 
presence/absence data, 0s and 1s)

predy.txt (column names are the predictor variables, numerical, one is 
categorical, row names are again the planning units where fish surveys were 
conducted)

predallx.txt (column names are the predictor variables, row names now are ALL 
the planning units, also those where no fish survey was conducted)

# I do a glm analysis for my fish models, loading the files first:

predallx<-read.table("predallx.txt",header=TRUE,row.names=1,sep="\t")
predallx$exposure<-factor(predallx$exposure)
predallx$exposure<-ordered(predallx$exposure,levels=c("Sheltered","Medium","Medexposed","Exposed"))
attach(predallx)

predy<-read.table("predy.txt",header=TRUE, row.names=1,sep="\t")
predy$exposure<-factor(predy$exposure)
attach(predy)

testfishes<-read.table("testfishes.txt",header=TRUE,row.names=1,sep="\t")
attach(testfishes)

# Creating a table for my glm output, models are compared on the basis of AIC 
values:

cat('fish',' ','model',' ','AIC',' ','df.residual',' ','deviance',"\n",
    file="AICfish.txt",append=FALSE)


# Creating a matrix for the predicted values (fishoccur) and a matrix for the 
upper and lower ci:

fishoccur<-matrix(0,nrow(predallx),ncol(testfishes))
colnames(fishoccur)<-colnames(testfishes)
rownames(fishoccur)<-rownames(predallx)

upperresponse<-matrix(0,nrow(predallx),ncol(testfishes))
colnames(upperresponse)<-colnames(testfishes)
rownames(upperresponse)<-rownames(predallx)

lowerresponse<-matrix(0,nrow(predallx),ncol(testfishes))
colnames(lowerresponse)<-colnames(testfishes)
rownames(lowerresponse)<-rownames(predallx)

# Now I run the anlysis in a loop + stepwise approach and enter results in the 
tables created above, "i" stands for every single fish that in turn should run 
through the loop:

for (i in 1:5)
{
  predy$eachfish <- testfishes[,i]
modelfish <- glm(eachfish~depth500m + exposure + nearest.est + island + 
mangroves + seagrass,
                 family=binomial, data=predy)
stepmodfi <- step(modelfish, trace= 0)

 cat(i, make.names(c(stepmodfi$call),unique=TRUE), AIC(stepmodfi),
    df.residual(stepmodfi),deviance(stepmodfi),"\n", 
file="AICfish.txt",append=TRUE)

# Eventually I want to get predicted values for all those planning units where 
no fish surveys were conducted:

# First I get my predicted values from the response, this yields a predicted 
value for every single fish and every single planning unit, and these results 
will be fed into the previously constructed matrix "fishoccur":

predictionresponse <- predict(stepmodfi, predallx, type="response", 
se.fit=FALSE)
fishoccur[,i] <- predictionresponse

# Now I want to get confidence intervals for these predicted values. I figured 
out that I would need to get these from the logit scale first and only 
thereafter transfer them to the response scale. So, first I get again predicted 
values, but on the logit scale. This time though I get just ONE result for 
every planning unit, when actually I also expected to get a result for every 
fish and every planning unit. So, it seems that there is just one value for 
each planning unit, but I thought it would be nessesary to get one value for 
each fish and planning unit combination(?) 

predictionlogit <- predict(stepmodfi, predallx, type="link", se.fit=TRUE)

# Then, nevertheless, I calculated the upper and lower ci values and putting 
them into the previously created matrices of upperresponse and lowerresponse, 
the "i" stands for the number of fishes used in this run, so that I get one 
value for every combination of fish and planning unit. Yet, I wonder whether 
this can be done, if first I only get one predicted value from the logit scale:

upperresponse[,i] <- exp(upperlogit)/(1+exp(upperlogit))
lowerresponse[,i] <- exp(lowerlogit)/(1+exp(lowerlogit))

# Or I use your adjusted formula:

upperresponse[,i] <- 1/(1+exp(-upperlogit))
lowerresponse[,i] <- 1/(1+exp(-lowerlogit))

# Close the loop:

}


# Finally extracting tables:

write.table(fishoccur,file="fishoccur.txt",row.names=TRUE,sep=",")
write.table(upperresponse,file="upperresponse.txt",row.names=TRUE,sep=",")
write.table(lowerresponse,file="lowerresponse.txt",row.names=TRUE,sep=",")

When I use the formula exp(upperlogit)/(1+exp(upperlogit)) I get quite many NAs 
in my upper ci values, the lower though seems fine. 
Using your formular eliminated the NAs but produces some confidence intervals 
which - particularly for very small predicted values - stretch from 0 to 1, 
which of course doesn't qualify as a confidence interval.

I hope that the description here is more stringent and I would be very grateful 
if you had any further idea, why I either get NAs or for some values 0-1 
confidence intervals and whether the prediction on the logit scale in the first 
place is conducted correctly.

Thank you very much!

Regards,

Christine




Your response:

Possibly your calculation overflows: exp(upperlogit)/(1+exp(upperlogit)) 
could be replaced by 1/(1+exp(-upperlogit)), or even better by 
plogis(upperlogit).  This could happen via the Hauck-Donner effect: the 
fitted probabilities are very near one and the standard errors are very 
large.

As for your other points, please follow the posting guide and
'provide commented, minimal, self-contained, reproducible code'.


-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence interval for the logit - predict.glm

Reply via email to