Re: 1. X*beta != linear.predictor. The equality is stated in three different help docs, which is misleading, especially in light of the way glm is set up. I felt like was wrestling with SAS :-) The relative risk was the original idea behind cox regression, but it can be used for many non-relative purposes. If we want to calculate death probability in each period, then lp is no longer shift invariant. Re: 2. Survfit is too slow. It seems that the implementation follows the procedure in the original Cox paper, which calls iterative optimization for each death time. My subjects are mortgages and both the estimation and the prediction samples are several hundred thousand. The call appears to recalculate/optimize everything even though only the $surv changes. Since each subject belongs to a single strata, most of the calculations are redundant. I am not much of a programmer and could never figure out how to use the R profiler, so cannot be exact here, but the simple exponentiation takes no time and survfit takes several secs for each subject. So I did:
survlong <- survfit(modlong) # a single call suffices bl1 <- c(1,cumsum(survlong$strata)+1) bl2 <- cumsum(survlong$strata) # get the start and end of each strata for (jj in 1:nrow(newapp)){ strat=as.integer(newapp[jj,"termfac"]) surv <- survlong$surv[(b1[strat]):(b2[strat])] # extract the strata risk <- predict(modlong,new=newapp[jj,],type="risk")# it seems there is no # optimization here newsurv <- surv^risk # we done ... rest of code } As a package maintainer, you have to decide whether including any of the above and below is useful or users can figure out things on their own. Or maybe survfit can be made smart and subsequent calls on the same model will use the first call to survfit?? It's your call :-) Kind regards Stephen B -----Original Message----- From: Terry Therneau [mailto:thern...@mayo.edu] Sent: Thursday, October 28, 2010 6:39 PM To: Bond, Stephen; David Winsemius Cc: r-help@r-project.org Subject: Re: [R] coxph linear.predictors Gentlemen, I read R-news in batch mode so I'm often a day behind. Let me try to answer some of the questions. 1. X*beta != linear.predictor. I'm sorry if the documentation isn't all it could be. Between the book, tech report, and help I've written about 400 pages, but this particular topic isn't yet in it. The final snipe about being "opaque like SAS" was really unfair. The Cox model is a relative risk model, if lp is a linear predictor then so is lp +c for any constant; they are equally good and equally valid. The linear.predictor component in a coxph fit is (X-means) * beta. The computation exp(lp) occurs multiple times downstream and this keep the exp function from overflowing when there is something like a Date object as a predictor. Adding this constant changes not a single downstream calcuation. 2. Survfit is too slow. I'd like to hear more about this. My work mostly involves modest data sets so perhaps I haven't seen it. Accuracy and maintainability have been my first worries. 3. Baseline survival. Let xbase be a particular set of values for the x covariates (one for each). The survival curve for a given xbase is obtained from survfit fit <- coxph(.... sfit <- survfit(fit, newdata=xbase) chaz <- -log(sfit$surv) #cumulative hazard (The xbase vector will need to have variable names for the function to know which value goes to which of course). The cumulative hazard for any other subject will be newhaz <- chaz * exp(fit$coef%*% (x-xbase)) There is not a simple transformation of the standard error from one fit to another, however. You will need to call survfit with a data frame for newdata, which will return one curve per row with the proper values. In my view there is no such thing as "A" baseline survival curve. Any xbase you chose is a baseline. However, it is wise to choose something near the center of the data space in order to avoid numeric problems with the exp function above. I would never ever chose a vector of zeros, although some text books do -- it saves them about 8 characters of typing in the newhaz formula above. Terry Therneau ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.