Re: [R] obtainl survival curves for single strata

Bond, Stephen Fri, 01 Feb 2013 11:24:38 -0800

Dr. Therneau,

You are correct about the fit:


(cph.approve <- 
coxph(Surv(fundterm,resp)~I(CommitAmt/1e5)+res+CommitedRate+dayflag+mth+strata(termfac),
 data=dfa, 
subset=(HedgeDate<"2012-11-15" & p1!="FixedOpen"), method="efron"))

However the output from survfit has individuals in each column and the strata 
are stacked so, in order to use that I have to subset the exact rows for the 
strata I need even though the strata is provided in the newdata argument.
Based on the help file it seems like the function should be able to use the 
strata info and return a single strata per subject. I am pasting my current 
code, which is not fast due to calls to reshape. If sth similar can be achieved 
by calling the compiled code it should run much faster allowing 100k subjects 
to be done in 2-3 min.

To use survfit as is, I would need to achieve subsetting in a for loop (going 
through columns), which is even slower than reshape. In my old code I processed 
subjects one by one in a for loop and that was much slower than grouping all 
subjects from the same strata together as in the code below.

surv.approve <- survfit(cph.approve)
b1 <- c(1,cumsum(surv.approve$strata)+1)
b2 <- cumsum(surv.approve$strata)
b1 <- b1[-length(b1)]
stratbins <- 
data.frame(termfac=as.integer(substring(names(b2),9,9)),start=b1,finish=b2)

> stratbins
  termfac start finish
1       1     1     93
2       2    94    187
3       3   188    282
4       4   283    372
5       5   373    462
6       6   463    553
7       7   554    633
8       8   634    695
9       9   696    789

strats <- sort(unique(newapp$termfac))
for (jj in strats){
  cat('strata ',jj,'\n')
  block <- newapp[newapp$termfac==jj,]
  surv <- 
surv.approve$surv[stratbins[stratbins$termfac==jj,"start"]:stratbins[stratbins$termfac==jj,"finish"]]
   
risk <- predict(cph.approve,new=block,type="risk",ref="sample")
  newsurv <- outer(surv,risk,"^")

  days <- 
as.Date(outer(surv.approve$time[stratbins[stratbins$termfac==jj,"start"]:stratbins[stratbins$termfac==jj,"finish"]],
                        block$HedgeDate,"+"))

  fund <- t(t(newsurv)*block$CommitAmt)
  fund <- rbind(block$CommitAmt,fund)
  fund <- -diff(fund)
  fund <- as.data.frame(t(fund))
  fund$acct <- as.integer(rownames(fund))
  ncols <- ncol(fund)-1
  fundlong <- 
reshape(fund,dir="long",varying=list(colnames(fund)[1:ncols]),idvar="acct",timevar="daycnt")
  fundlong <- fundlong[order(fundlong$acct,fundlong$daycnt),]

  days <- matrix(days,nrow=nrow(days))
  days <- t(days)
  days <- cbind(fund[,"acct"],days)
  days <- as.data.frame(days)
  colnames(days)[1] <- "acct"
  daylong <- 
reshape(days,dir="long",varying=list(colnames(days)[2:(ncols+1)]),idvar="acct",timevar="dt")

  daylong <- daylong[order(daylong$acct,daylong$dt),]
  daylong$V2 <- as.Date(daylong$V2,origin = "1970-01-01")
  fundlong$dt <- as.character(daylong$V2)
  
  sqlSave(ch1,fundlong,"tmpfund")
  sqlQuery(ch1, "insert into RRfund (acct,dt,fund,daycnt) select 
acct,dt,V1,daycnt from tmpfund")
  sqlQuery(ch1,"drop table tmpfund")

}

Output looks like:

acct    dt      fund    daycnt
3623963 2012-11-16 00:00:00     472.99329489343 1
3623963 2012-11-17 00:00:00     5842.48228943771        2
3623963 2012-11-18 00:00:00     7807.17102672733        3
3623963 2012-11-19 00:00:00     7244.01769700345        4
3623963 2012-11-20 00:00:00     7073.83109592798        5
3623963 2012-11-21 00:00:00     8745.91515512884        6
3623963 2012-11-22 00:00:00     9473.87152806949        7
3623963 2012-11-23 00:00:00     12627.8890422916        8
3623963 2012-11-24 00:00:00     19598.5684713097        9
3623963 2012-11-25 00:00:00     56094.5812136802        10
3623963 2012-11-26 00:00:00     25690.439183149 11
3623963 2012-11-27 00:00:00     25420.9915256714        12
3623963 2012-11-28 00:00:00     30322.3750865434        13
3623963 2012-11-29 00:00:00     18651.1136846758        14
3623963 2012-11-30 00:00:00     20291.4528067292        15


Stephen 

-----Original Message-----
From: Terry Therneau [mailto:thern...@mayo.edu] 
Sent: Friday, February 01, 2013 10:11 AM
To: r-help@r-project.org; Bond, Stephen
Subject: Re: obtainl survival curves for single strata

Stephen,
    I can almost but not quite get my arms around what you are asking.  A bit 
more detail 
would help.  Like
      cph.approve = what kind of object, generated by function __

But, let me make a guess:
    cph is the result of coxph, and the model has both covariates and a strata
    You want predictioned survival curves, more than 1, of the type "covariates 
= a, b,c, 
strata=1"  "covariates = d,e, f, strata=2", ... for arbitrary covariates and 
strata.

Now, the predicted survival curves for different strata are different lengths.
The survfit.coxph routine gets around this by being verbose: it expects you to 
give it 
covariate sets, and returns
all of the strata for each covariate. This allows it to give a compact result.  
 You can 
always do:
    newpred <- survfit(cox-model-fit, newdata=something)
    newpred[5,17] #  survival curve for the 5th strata, covariates from the 
17th row of 
newdata

But, you want to put the results into a matrix, for some pre-specifed set of 
time points.  
Take it one step further.
     mytimepoints <- seq(0, 5*365, by=30)  # every 30 days
     z <- summary(newpred[5,17], time=mytimepoints, extend=TRUE)$surv

The summary.survfit function's "time" argument was originally written for 
people who only 
wanted to print certain time points, but it works as well for those who only 
want to 
extract certain ones.  It correctly handles the fact that the curve is a step 
function.

Terry Therneau


On 02/01/2013 05:00 AM, r-help-requ...@r-project.org wrote:
> What is the syntax to obtain survival curves for single strata on many 
> subjects?
>
> I have a model based on Surv(time,response) object, so there is a single row 
> per subject and no start,stop and no switching of strata.
>
> The newdata has many subjects and each subject has a strata and the survival 
> based on the subject risk and the subject strata is needed.
>
> If I do
>
> newpred<- survfit(cph.approve,new=newapp,se=F)
>
> I get all strata for every subject.
>
> Attempting
>
>> >  newpred<- survfit(cph.approve,new=newapp,id=CertId,se=F)
> Error in survfit.coxph(cph.approve, new = newapp, id = CertId, se = F) :
>    The individual option is  only valid for start-stop data
>> >  newpred<- survfit(cph.approve,new=newapp,indi=T,se=F)
> Error in survfit.coxph(cph.approve, new = newapp, indi = T, se = F) :
>    The individual option is  only valid for start-stop data
>
> Please, advise if obtaining a single strata for a basic (time,response) model 
> is possible? Due to differing lengths of the surv for different strata this 
> will not go in a "wide" data.frame without padding.
>
> Thanks everybody and have a great day.
>
> Stephen B
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] obtainl survival curves for single strata

Reply via email to