Hi, I figure out the reason for the difference. There are ties in failure times in the data set. Consequently, it matters which method is used to handle ties in "coxph". The default is "efron". If I use method="breslow", there is no difference between the 2 different ways of computing the Nelson-Aalen estimte.
require(foreign) gb <- read.dta("GB.dta") # Green & Byar data; N = 483 # Method 1 (note the use of Breslow estimator) fit1 <- coxph( Surv(time, status=="Cancer" | status=="CVD" | status=="Other") ~ 1, method="breslow", data=gb) h1 <- basehaz(fit1) # Method 2 fit2 <- survfit(Surv(time, status=="Cancer" | status=="CVD" | status=="Other") ~ 1, data=gb) jump <- fit2$n.event > 0 h2 <- cumsum(fit2$n.event[jump]/fit2$n.risk[jump]) > min(abs(h1$hazard - h2)) [1] 1.387779e-17 Best, Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ravi Varadhan Sent: Monday, May 04, 2009 12:33 AM To: r-help@r-project.org Subject: [R] Nelson-Aalen estimator of cumulative hazard Hi, I am computing the Nelson-Aalen (NA) estimate of baseline cumulative hazard in two different ways using the "survival" package. I am expecting that they should be identical. However, they are not. Their difference is a monotonically increasing with time. This difference is probably not large to make any impact in the application, but is annoyingly non-trivial for me to just ignore it. This is a competing risks problem, with the Green & Byar (1980) data set (the STATA data set is attached). Can anyone explain to me the reason for the discrepancy? require(foreign) gb <- read.dta("GB.dta") # Green & Byar data; N = 483 # Method 1 fit1 <- coxph( Surv(time, status=="Cancer" | status=="CVD" | status=="Other") ~ 1, data=gb) h1 <- basehaz(fit1) # Method 2 fit2 <- survfit(Surv(time, status=="Cancer" | status=="CVD" | status=="Other") ~ 1, data=gb) jump <- fit2$n.event > 0 h2 <- cumsum(fit2$n.event[jump]/fit2$n.risk[jump]) plot(h1$time, h1$hazard - h2) Thank you, Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.