On Fri, Jul 22, 2011 at 2:04 PM, Terry Therneau <thern...@mayo.edu> wrote: > For time scale that are truly discrete Cox proposed the "exact partial > likelihood".
Or "the method of partial likelihood" applied to the discrete logistic model, > I call that the "exact" method and SAS calls it the > "discrete" method. What we compute is precisely the same, however they > use a clever algorithm which is faster. Note that the model to estimate here is discrete. The "base-line" conditional probabilities at each failure time are eliminated through the partial likelihood argument. This can also be described as a conditional logistic regression, where we condition on the total number of failures in each risk set (thus eliminating the risk-set-specific parameters). Suppose that in a risk set of size n there are d failures. This method must then consider all possible ways of choosing d failures out of n at risk, or choose(n, d) cases. This makes the computational burden huge with lots of ties. The method "ml" in "coxreg" (package 'eha') uses a different approach. Instead of conditional logistic regression it performs unconditional logistic regression by adding one parameter per risk set. In principle this is possible to do with 'glm' after expanding the data set with "toBinary" in 'eha', but with large data sets and lots of risk sets, glm chokes. Instead, with the "ml" approach in "coxreg", the extra parameters just introduced are eliminated by profiling them out! This leads to a fast estimation procedure, compared to the abovementioned 'exact' methods. A final note: with "ml", the logistic regression uses the cloglog link, to be compatible with the situation when data really are continuous but grouped, and a proportional hazards model holds. (Interestingly, conditional inference is usually used to simplify things; here it creates computational problems not present without conditioning.) > To make things even more > confusing, Prentice introduced an "exact marginal likelihood" which is > not implemented in R, but which SAS calls the "exact" method. This is not so confusing if we realize that we now are in the continuous time model. Then, with a risk set of size n with d failures, we must consider all possible permutations of the d failures, or d! cases. That is, here we assume that ties occur because of imprecise measurement and that there is one true ordering. This method calculates an average contribution to the partial likelihood. (Btw, you refer to "Prentice", but isn't this from the Biometrika paper by Kalbfleisch & Prentice (1973)? And of course their classical book?) > Data is usually not truly discrete, however. More often ties are the > result of imprecise measurement or grouping. The Efron approximation > assumes that the data are actually continuous but we see ties because of > this; it also introduces an approximation at one point in the > calculation which greatly speeds up the computation; numerically the > approximation is very good. Note that both Breslow's and Efron's approximations are approximations of the "exact marginal likelihood". > In spite of the irrational love that our profession has for anything > branded with the word "exact", I currently see no reason to ever use > that particular computation in a Cox model. Agreed; but only because it is so time consuming. The unconditional logistic regression with profiling is a good alternative. > I'm not quite ready to > remove the option from coxph, but certainly am not going to devote any > effort toward improving that part of the code. > > The Breslow approximation is less accurate, but is the easiest to > program and therefore was the only method in early Cox model programs; > it persists as the default in many software packages because of history. > Truth be told, unless the number of tied deaths is quite large the > difference in results between it and the Efron approx will be trivial. > > The worst approximation, and the one that can sometimes give seriously > strange results, is to artificially remove ties from the data set by > adding a random value to each subject's time. Maybe, but randomly breaking ties may not be a bad idea; you could regard that as getting an (unbiased?) estimator of the exact (continuous-time) partial likelihood. Expanding: Instead of going through all possible permutations, why not take a random sample of size greater than one? Göran > Terry T > > > --- begin quote -- > I didn't know precisely the specifities of each approximation method. > I thus came back to section 3.3 of Therneau and Grambsch, Extending the > Cox > Model. I think I now see things more clearly. If I have understood > correctly, both "discrete" option and "exact" functions assume "true" > discrete event times in a model approximating the Cox model. Cox partial > likelihood cannot be exactly maximized, or even written, when there are > some > ties, am I right ? > > In my sample, many of the ties (those whithin a single observation of > the > process) are due to the fact that continuous event times are grouped > into > intervals. > > So I think the logistic approximation may not be the best for my problem > despite the estimate on my real data set (shown on my previous post) do > give > interessant results regarding to the context of my data set ! > I was thinking about distributing the events uniformly in each interval. > What do you think about this option ? Can I expect a better > approximation > than directly applying Breslow or Efron method directly with the grouped > event data ? Finally, it becomes a model problem more than a > computationnal > or algorithmic one I guess. > > > > -- Göran Broström ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.