Terry, My point was that if you are asking the question: What is the average time to death based on a set of variables? The only logical approach for calculating actual time to death is to use uncensored cases, because we do not know the time to death for the censored cases and can only estimate them. While actual time to death for uncensored cases may not be a very useful piece of information, it can indeed be calculated. However, as you point out predicted values for time to death can be estimated using the survival function which incorporates both censored and uncensored data. However, the assumption of proportional hazards is rarely defensible.
Best, Jim On Fri, Nov 12, 2010 at 12:09 PM, Terry Therneau <thern...@mayo.edu> wrote: > Since I read the list in digest form (and was out ill yesterday) I'm > late to the discussion. > > There are 3 steps for predicting survival, using a Cox model: > > 1. Fit the data > fit <- coxph(Surv(time, status) ~ age + ph.ecog, data=lung) > > The biggest question to answer here is what covariates you wish to base > the prediction on. There is the usual tradeoff between too few (leave > out something important) or too many (including unimportant things). > > 2. Get survival curves > curves <- survfit(fit, newdata= _____) > The newdata needs to include all the covariates in your model. > > 3. Summarize > Note that you don't get a single number prediction for each subject, > you get a set of survival curves. plot(curves[1]) for instance shows > you the first one, plot(curves[2]) the second. > print(curves) will give a 1 line per curve summary including the > median, and optionally one of several versions of the mean. See the > discussion in help(print.survfit). The mean is rarely used as a summary > due to the fact that we don't see the whole distribution. (Use temp<- > summary(curves); temp$table to use the printout values in further > calculations.) > > ------------------- > > The same process applies for parametric survival using survreg. In > return for specifying a distributional form, the predicted survival > curve for a particular subject is completely defined. This includes the > mean and all quantiles. Reliablity analysis (survival analysis in > industry) uses parametric almost exclusively, since the tail of the > distribution is of greatest interest. Your use of > predict(,type='response') is almost correct, there is just the math > detail that the Weibull fits on a log scale, so the returned value is a > geometric mean time to death rather than an arithmetic mean. > > The suggestion to use ordinary regression on the observed times is > wrong. Censored data is more complex than that. > > Terry Therneau > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- *James C. Whanger Research Consultant 2 Wolf Ridge Gap Ledyard, CT 06339 Phone: 860.389.0414* [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.