[R] Time Dependent Cox Model
I am having trouble formatting some survival data to use in a time dependent cox model. My time dep. variable is habitat and I have it recorded for every day (with some NAs). I think it is working properly except for calculating the death.time. This column should be 1s or 0s and as I have it only produces 0s. Any help will be greatly appreciated. http://www.nabble.com/file/p25881478/Survival_master2.csv Survival_master2.csv Here is my code: sum(!is.na(surv[,16:726])) surv2<-matrix(0,12329,19) colnames(surv2)<-c('start', 'stop', 'death.time', names(surv)[1:15],'habitat') row<-0 # set record counter to 0 for (i in 1:nrow(surv)) { # loop over individuals for (j in 16:726) { # loop over 726 days if (is.na(surv[i, j])) next # skip missing data else { row <- row + 1 # increment row counter start <- j - 11 # start time (previous day) stop <- start + 1 # stop time (day) death.time <- if (stop == surv[i, 4] && surv[i, 5] ==1) 1 else 0 # construct record: surv2[row,] <- c(start, stop, death.time, unlist(surv[i, c(1:15, j)])) } } } surv2<-as.data.frame(surv2) -- View this message in context: http://www.nabble.com/Time-Dependent-Cox-Model-tp25881478p25881478.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time Dependent Cox Model
Does anyone have suggestions? Thanks! quaildoc wrote: > > I am having trouble formatting some survival data to use in a time > dependent cox model. My time dep. variable is habitat and I have it > recorded for every day (with some NAs). I think it is working properly > except for calculating the death.time. This column should be 1s or 0s and > as I have it only produces 0s. Any help will be greatly appreciated. > > > http://www.nabble.com/file/p25881478/Survival_master2.csv > Survival_master2.csv > > > > Here is my code: > sum(!is.na(surv[,16:726])) > > surv2<-matrix(0,12329,19) > colnames(surv2)<-c('start', 'stop', 'death.time', > names(surv)[1:15],'habitat') > row<-0 # set record counter to 0 > for (i in 1:nrow(surv)) { # loop over individuals > for (j in 16:726) { # loop over 726 days > if (is.na(surv[i, j])) next # skip missing data > else { > row <- row + 1 # increment row counter > start <- j - 11 # start time (previous day) > stop <- start + 1 # stop time (day) > death.time <- if (stop == surv[i, 4] && surv[i, 5] ==1) 1 else > 0 ># construct record: > surv2[row,] <- c(start, stop, death.time, unlist(surv[i, > c(1:15, j)])) > } > } >} > surv2<-as.data.frame(surv2) > -- View this message in context: http://www.nabble.com/Time-Dependent-Cox-Model-tp25881478p25893488.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time Dependent Cox Model
Some suggested that go into more detail on what I wanted to accomplish and the rest of my code. I want to accomplish exactly what Fox did in this article( http://www.nabble.com/file/p25897307/appendix-cox-regression.pdf appendix-cox-regression.pdf ) (starting with page 7), except using "habitat" instead of employment. I want habitat to be a time dep. covariate and it varys by day. I read in my data as the csv. file, and one major difference in the data set Fox used and min is I have a DaysatRisk column instead of the "week" the person went back to jail. This I think is the root of my problem calculating the proper death.time. The death.time column should be 1s and 0s that corresponds to the day the animal died. Thanks in advance, surv<-read.csv("Survival_master2.csv", header = TRUE) sum(!is.na(surv[,16:726])) surv2<-matrix(0,12329,19) colnames(surv2)<-c('start', 'stop', 'death.time', names(surv)[1:15],'habitat') row<-0 # set record counter to 0 for (i in 1:nrow(surv)) { # loop over individuals for (j in 16:726) { # loop over 52 weeks if (is.na(surv[i, j])) next # skip missing data else { row <- row + 1 # increment row counter start <- j - 11 # start time (previous week) stop <- start + 1 # stop time (current week) death.time <- if (stop == surv[i, 4] && surv[i, 5] ==1) 1 else 0 # construct record: surv2[row,] <- c(start, stop, death.time, unlist(surv[i, c(1:15, j)])) } } } surv2<-as.data.frame(surv2) remove(i,j,row,start,stop,death.time) surv2[1:15,] test<-coxph(Surv(start,stop,death.time)~habitat, data=surv2) JorisMeys wrote: > > Well, > > it might be wise to elaborate a bit more about the variables and what > exactly you want e.g. death-time to be. I'd interprete it as time of > death, but the fact that it is 0/1, means it is a logical (?) binary > variable of some sort. > > Please ask your question in such a way that somebody who doesn't know > the dataset and your research, can still understand what is inside the > dataset and what exactly you're trying to obtain. > > I'd also suggest to add the command to read in the data. I don't have > the time to spend looking around how exactly I can read in the dataset > in such a way it fits what you have in your workspace. > > Cheers > Joris > > On Wed, Oct 14, 2009 at 5:37 PM, quaildoc wrote: >> >> Does anyone have suggestions? Thanks! >> >> quaildoc wrote: >>> >>> I am having trouble formatting some survival data to use in a time >>> dependent cox model. My time dep. variable is habitat and I have it >>> recorded for every day (with some NAs). I think it is working properly >>> except for calculating the death.time. This column should be 1s or 0s >>> and >>> as I have it only produces 0s. Any help will be greatly appreciated. >>> >>> >>> http://www.nabble.com/file/p25881478/Survival_master2.csv >>> Survival_master2.csv >>> >>> >>> >>> Here is my code: >>> sum(!is.na(surv[,16:726])) >>> >>> surv2<-matrix(0,12329,19) >>> colnames(surv2)<-c('start', 'stop', 'death.time', >>> names(surv)[1:15],'habitat') >>> row<-0 # set record counter to 0 >>> for (i in 1:nrow(surv)) { # loop over individuals >>> for (j in 16:726) { # loop over 726 days >>> if (is.na(surv[i, j])) next # skip missing data >>> else { >>> row <- row + 1 # increment row counter >>> start <- j - 11 # start time (previous day) >>> stop <- start + 1 # stop time (day) >>> death.time <- if (stop == surv[i, 4] && surv[i, 5] ==1) 1 >>> else >>> 0 >>> # construct record: >>> surv2[row,] <- c(start, stop, death.time, unlist(surv[i, >>> c(1:15, j)])) >>> } >>> } >>> } >>> surv2<-as.data.frame(surv2) >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Time-Dependent-Cox-Model-tp25881478p25893488.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide com
[R] Prediction Error Calculation
Hello List, I am fitting a logistic regression model for some presence/absence type data. I have numerous covariates I am fitting to explain variation, and I am using AIC to rank models. However, I would like to report how well my best model (s) do at prediction. I have looked over the archives and the web and have come up with something that gives me what I think is the mean prediction error, BUT I am not sure of that. I am sort of unfamiliar with these types of statistics. Here is my code: metrics.global<-glm(Type~MPI+IJI+ED+PRD+class2+class3+class5, family=binomial, data=metrics)## ##Type is my binary response 0 or 1 muhat<-metrics.global$fitted.values ##assigns the fitted values a name muhat global.diag<-glm.diag(metrics.global) ##creates a the diagnostic values cv.err<-mean((metrics.global$y-muhat)^2/(1-global.diag$h)^2) ###calculates cv.err cv.err My main problem is I am unsure how to interpret what cv.err means for my model. I know that h is a leverage statistic for each observation. I would appreciate some interpretation clarification. Thank you. -- View this message in context: http://www.nabble.com/Prediction-Error-Calculation-tp26031236p26031236.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prediction Error Calculation
Any suggestions? quaildoc wrote: > > Hello List, > > I am fitting a logistic regression model for some presence/absence type > data. I have numerous covariates I am fitting to explain variation, and I > am using AIC to rank models. However, I would like to report how well my > best model (s) do at prediction. I have looked over the archives and the > web and have come up with something that gives me what I think is the mean > prediction error, BUT I am not sure of that. I am sort of unfamiliar with > these types of statistics. Here is my code: > > > metrics.global<-glm(Type~MPI+IJI+ED+PRD+class2+class3+class5, > family=binomial, data=metrics)## ##Type is my binary response 0 or 1 > > muhat<-metrics.global$fitted.values > ##assigns the fitted values a name muhat > global.diag<-glm.diag(metrics.global) > ##creates a the diagnostic values > cv.err<-mean((metrics.global$y-muhat)^2/(1-global.diag$h)^2) > ###calculates cv.err > cv.err > > > My main problem is I am unsure how to interpret what cv.err means for my > model. I know that h is a leverage statistic for each observation. I > would appreciate some interpretation clarification. > > Thank you. > > > > > -- View this message in context: http://www.nabble.com/Prediction-Error-Calculation-tp26031236p26066845.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prediction Error Calculation
Any help would be appreciated. quaildoc wrote: > > Hello List, > > I am fitting a logistic regression model for some presence/absence type > data. I have numerous covariates I am fitting to explain variation, and I > am using AIC to rank models. However, I would like to report how well my > best model (s) do at prediction. I have looked over the archives and the > web and have come up with something that gives me what I think is the mean > prediction error, BUT I am not sure of that. I am sort of unfamiliar with > these types of statistics. Here is my code: > > > metrics.global<-glm(Type~MPI+IJI+ED+PRD+class2+class3+class5, > family=binomial, data=metrics)## ##Type is my binary response 0 or 1 > > muhat<-metrics.global$fitted.values > ##assigns the fitted values a name muhat > global.diag<-glm.diag(metrics.global) > ##creates a the diagnostic values > cv.err<-mean((metrics.global$y-muhat)^2/(1-global.diag$h)^2) > ###calculates cv.err > cv.err > > > My main problem is I am unsure how to interpret what cv.err means for my > model. I know that h is a leverage statistic for each observation. I > would appreciate some interpretation clarification. > > Thank you. > > > > > -- View this message in context: http://www.nabble.com/Prediction-Error-Calculation-tp26031236p26113145.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.