A couple of thoughts. 1. More than 1/2 the work for survfit.coxph is computing standard errors. If you don't need them adding se.fit=FALSE will help the speed.
2. Survival curves with time dependent covariates is a complex topic. To get the "probability of default in each month during next 2 years" you need to create a scenario that specifies exactly what those time dependent covariates will do over the next two years. My book has a long discussion on this, could I suggest you borrow a copy and read it? 3. For time fixed covariates there is a simple formula to get S(t;x) from S(t; x0), i.e., if you have the predicted curve for some covariate choice x0 you can easily derive it for any other chosen x. That formula doesn't work in the time dependent variable case (you can't factor exp(x) out from under an integral of [exp(x) g(t) dt] when x is a function of t). Unless you want to learn a lot more math and do custom programming, I think you are stuck with survfit. Terry Therneau ---------- begin included message ---------------- I found survfit function was very slow for a large dataset and I am looking for an alternative way to quickly get the predicted survival probabilities. My historical data set is a pool of loans with monthly observed default status for 24 months. I would like to fit the proportional hazard model with time varying covariate such as unemployment rates and time constant variables at loan application in a counting process format, and then use the model to predict the probability of default in each month during next 2 years for a pool of new loans. I have read some posts from other R users. It sounds like using (average survival probability)^exp((X-means(X)*Beta) can quickly get the predicted survival probabilities. My predictors for the model include both continuous variables and categorical variables and my dataset is in counting process format with both time varying and time constant predictors. So how should I take the mean? I guess it's the mean of training data? And the denominator for the mean is the number of observations (i.e, the number of rows of training data in the counting process format)? What if the predictor is a categorical variable? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.