Dear Terry,

Thank you very much for taking your time to address this problem!  

I did check the data in F&H.  I couldn't detect any differences between the
R data set and the one in the Appendix.  The preface in F&H acknowledges
that the data set was obtained from Roland Dickinson.  Is the data set in R
created by Tom Fleming based on the original Mayo data?

Where do the papers that reference this data set get their data from?  Do
they get it from the URL that you gave me?  It is impossible to tell from
the papers because they just cite the F&H appendix as the source of the
data, but obviously they must have gotten it as an electronic version from
somewhere.  If so, is the electronic version the same as the R data set?

This is relevant for me because I am trying to compare the results of my
estimation algorithm to that in another paper (which, of course, simply
cites F&H for the data).

Best regards,
Ravi.

----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: [EMAIL PROTECTED]

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 

----------------------------------------------------------------------------
--------


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Terry Therneau
Sent: Monday, November 24, 2008 8:40 AM
To: [EMAIL PROTECTED]
Cc: r-help@r-project.org
Subject: Re: [R] Discrepancy in the PBC data set

  The data set in R is wrong. I've found mistakes on 2 lines in a quick
look. 
  
  I don't know if the data is incorrect in the Appendix of Fleming and
Harrington as well (someone seems to have borrowed my copy), which is where
the data set appears to have been taken from, given all the "-9" codes in
it. (Note, Tom Fleming originally got the data from me, so I'm fairly
confident in calling my Mayo version the authoritative one).  I'll make sure
this gets fixed.
  
  You can grab a correct data set from our department web page.  Code is
below.
  
        Terry Therneau
        
  
pbcurl <-
"http://mayoresearch.mayo.edu/mayo/research/biostat/upload/therneau_upload/p
bc.d
at"

pbc <- read.table(pbcurl, header=F, 
                  col.names=c('id', 'time', 'status', 'trt',  'age', 'sex',
                              'ascites',  'hepato',  'spiders',  'edema',
                              'bili',  'chol',  'albumin',  'copper', 
                              'alk.phos',  'ast',  'trig',  'platelet',
                              'protime',  'stage'),
                  na.strings='.')
pbc$age <- pbc$age/365.25 

newfit <- coxph(Surv(time, status==2) ~ age + edema + log(bili) +
        log(protime) + log(albumin), data=pbc)

newfit
                coef exp(coef) se(coef)     z       p
age           0.0396    1.0404  0.00767  5.16 2.4e-07
edema         0.8963    2.4505  0.27141  3.30 9.6e-04
log(bili)     0.8636    2.3716  0.08294 10.41 0.0e+00
log(protime)  2.3868   10.8791  0.76851  3.11 1.9e-03
log(albumin) -2.5069    0.0815  0.65292 -3.84 1.2e-04

Likelihood ratio test=231  on 5 df, p=0  n=416 (2 observations deleted due
to
missingness)

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to