On Sat, 28 Mar 2009, Bob Green wrote:
Hello,
I am hoping for assistance in regards to examining the contribution of
stratified variables in a cox regression. A previous post by Terry Therneau
noted that "That is the point of a strata; you are declaring a variable toNOT
be proportional hazards, and thus there is no single "hazard ratio" that
describes it". Given this purpose of stratification, in the process of building
and testing a model, is there a way to test if the stratified variables do add
anything to a model?
I'm not aware of any formal test for whether stratification helps. It's
difficult because you are adding an infinite-dimensional parameter to the
model, and this parameter doesn't even appear in the partial likelihood.
Nothing simple is going to work.
In principle one could compare the two stratum baseline cumulative hazards to
see if they were proportional to each other, eg, see if the difference in
log-cumulative baseline hazard was constant over time. The bootstrap is valid
for the baseline cumulative hazards, so one could get confidence intervals on a
suitable summary statistic that way.
Two variables were stratified because it was considered that the proportional
hazards assumption was not met (via inspection of log-log plots where the
curves crossed. I have examined. There were no cox.zph values that were
statistically significant. I did produce plots but found these difficult to
interpret).
There isn't much information loss in stratifying, as long as it's not overdone,
which is probably why there hasn't been much work on tests. The main loss is
that the model becomes more complicated and harder to summarize.
The statistician I have been consulting said that in SPSS when
variables are stratified a model is produced for each different strata (e.g a
separate analysis for male and female if a gender variable were stratified).
I have not seen this approach used in R examples I have seen.
Fitting a completely separate model for each stratum is equivalent to
stratifying *and* adding a interaction with stratum to each predictor variable.
This does result in a loss of information, and is usually overkill. You can
add stratum interactions just to the variables where they are needed.
This may be related to the collision in terminology where epidemiologists say
'stratify' to mean 'do a completely separate analysis' and statisticians say
'stratify' to mean 'pool the stratum-specific analyses to get an overall
estimate'.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlum...@u.washington.edu University of Washington, Seattle
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.