Dr. Therneau, You are correct about the fit:
(cph.approve <- coxph(Surv(fundterm,resp)~I(CommitAmt/1e5)+res+CommitedRate+dayflag+mth+strata(termfac), data=dfa, subset=(HedgeDate<"2012-11-15" & p1!="FixedOpen"), method="efron")) However the output from survfit has individuals in each column and the strata are stacked so, in order to use that I have to subset the exact rows for the strata I need even though the strata is provided in the newdata argument. Based on the help file it seems like the function should be able to use the strata info and return a single strata per subject. I am pasting my current code, which is not fast due to calls to reshape. If sth similar can be achieved by calling the compiled code it should run much faster allowing 100k subjects to be done in 2-3 min. To use survfit as is, I would need to achieve subsetting in a for loop (going through columns), which is even slower than reshape. In my old code I processed subjects one by one in a for loop and that was much slower than grouping all subjects from the same strata together as in the code below. surv.approve <- survfit(cph.approve) b1 <- c(1,cumsum(surv.approve$strata)+1) b2 <- cumsum(surv.approve$strata) b1 <- b1[-length(b1)] stratbins <- data.frame(termfac=as.integer(substring(names(b2),9,9)),start=b1,finish=b2) > stratbins termfac start finish 1 1 1 93 2 2 94 187 3 3 188 282 4 4 283 372 5 5 373 462 6 6 463 553 7 7 554 633 8 8 634 695 9 9 696 789 strats <- sort(unique(newapp$termfac)) for (jj in strats){ cat('strata ',jj,'\n') block <- newapp[newapp$termfac==jj,] surv <- surv.approve$surv[stratbins[stratbins$termfac==jj,"start"]:stratbins[stratbins$termfac==jj,"finish"]] risk <- predict(cph.approve,new=block,type="risk",ref="sample") newsurv <- outer(surv,risk,"^") days <- as.Date(outer(surv.approve$time[stratbins[stratbins$termfac==jj,"start"]:stratbins[stratbins$termfac==jj,"finish"]], block$HedgeDate,"+")) fund <- t(t(newsurv)*block$CommitAmt) fund <- rbind(block$CommitAmt,fund) fund <- -diff(fund) fund <- as.data.frame(t(fund)) fund$acct <- as.integer(rownames(fund)) ncols <- ncol(fund)-1 fundlong <- reshape(fund,dir="long",varying=list(colnames(fund)[1:ncols]),idvar="acct",timevar="daycnt") fundlong <- fundlong[order(fundlong$acct,fundlong$daycnt),] days <- matrix(days,nrow=nrow(days)) days <- t(days) days <- cbind(fund[,"acct"],days) days <- as.data.frame(days) colnames(days)[1] <- "acct" daylong <- reshape(days,dir="long",varying=list(colnames(days)[2:(ncols+1)]),idvar="acct",timevar="dt") daylong <- daylong[order(daylong$acct,daylong$dt),] daylong$V2 <- as.Date(daylong$V2,origin = "1970-01-01") fundlong$dt <- as.character(daylong$V2) sqlSave(ch1,fundlong,"tmpfund") sqlQuery(ch1, "insert into RRfund (acct,dt,fund,daycnt) select acct,dt,V1,daycnt from tmpfund") sqlQuery(ch1,"drop table tmpfund") } Output looks like: acct dt fund daycnt 3623963 2012-11-16 00:00:00 472.99329489343 1 3623963 2012-11-17 00:00:00 5842.48228943771 2 3623963 2012-11-18 00:00:00 7807.17102672733 3 3623963 2012-11-19 00:00:00 7244.01769700345 4 3623963 2012-11-20 00:00:00 7073.83109592798 5 3623963 2012-11-21 00:00:00 8745.91515512884 6 3623963 2012-11-22 00:00:00 9473.87152806949 7 3623963 2012-11-23 00:00:00 12627.8890422916 8 3623963 2012-11-24 00:00:00 19598.5684713097 9 3623963 2012-11-25 00:00:00 56094.5812136802 10 3623963 2012-11-26 00:00:00 25690.439183149 11 3623963 2012-11-27 00:00:00 25420.9915256714 12 3623963 2012-11-28 00:00:00 30322.3750865434 13 3623963 2012-11-29 00:00:00 18651.1136846758 14 3623963 2012-11-30 00:00:00 20291.4528067292 15 Stephen -----Original Message----- From: Terry Therneau [mailto:thern...@mayo.edu] Sent: Friday, February 01, 2013 10:11 AM To: r-help@r-project.org; Bond, Stephen Subject: Re: obtainl survival curves for single strata Stephen, I can almost but not quite get my arms around what you are asking. A bit more detail would help. Like cph.approve = what kind of object, generated by function __ But, let me make a guess: cph is the result of coxph, and the model has both covariates and a strata You want predictioned survival curves, more than 1, of the type "covariates = a, b,c, strata=1" "covariates = d,e, f, strata=2", ... for arbitrary covariates and strata. Now, the predicted survival curves for different strata are different lengths. The survfit.coxph routine gets around this by being verbose: it expects you to give it covariate sets, and returns all of the strata for each covariate. This allows it to give a compact result. You can always do: newpred <- survfit(cox-model-fit, newdata=something) newpred[5,17] # survival curve for the 5th strata, covariates from the 17th row of newdata But, you want to put the results into a matrix, for some pre-specifed set of time points. Take it one step further. mytimepoints <- seq(0, 5*365, by=30) # every 30 days z <- summary(newpred[5,17], time=mytimepoints, extend=TRUE)$surv The summary.survfit function's "time" argument was originally written for people who only wanted to print certain time points, but it works as well for those who only want to extract certain ones. It correctly handles the fact that the curve is a step function. Terry Therneau On 02/01/2013 05:00 AM, r-help-requ...@r-project.org wrote: > What is the syntax to obtain survival curves for single strata on many > subjects? > > I have a model based on Surv(time,response) object, so there is a single row > per subject and no start,stop and no switching of strata. > > The newdata has many subjects and each subject has a strata and the survival > based on the subject risk and the subject strata is needed. > > If I do > > newpred<- survfit(cph.approve,new=newapp,se=F) > > I get all strata for every subject. > > Attempting > >> > newpred<- survfit(cph.approve,new=newapp,id=CertId,se=F) > Error in survfit.coxph(cph.approve, new = newapp, id = CertId, se = F) : > The individual option is only valid for start-stop data >> > newpred<- survfit(cph.approve,new=newapp,indi=T,se=F) > Error in survfit.coxph(cph.approve, new = newapp, indi = T, se = F) : > The individual option is only valid for start-stop data > > Please, advise if obtaining a single strata for a basic (time,response) model > is possible? Due to differing lengths of the surv for different strata this > will not go in a "wide" data.frame without padding. > > Thanks everybody and have a great day. > > Stephen B > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.