Since fit3.1 and fit2 are based on different data sets, why would I 
expect the same number of events?
Also, when you have a large number of variables, are observations being 
deleted due to missing values?

And to echo David W's comments -- it is hard for me to imagine a data 
set where this many variables can be looked at simultaneoulsy, and 
obtain a meaningful result.

Terry Therneau


On 08/09/2012 07:52 PM, Nasib Ahmed wrote:
> My sessionInfo is as follows:
>
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>  [1] mi_0.09-16       arm_1.5-05       foreign_0.8-50   abind_1.4-0
>  [5] R2WinBUGS_2.1-18 coda_0.15-2      lme4_0.999999-0  Matrix_1.0-6
>  [9] lattice_0.20-6   car_2.0-12       nnet_7.3-4       MASS_7.3-20
> [13] MuMIn_1.7.11     survival_2.36-14
>
> loaded via a namespace (and not attached):
> [1] grid_2.15.1   nlme_3.1-104  stats4_2.15.1
> >
>
>
> It will be difficult to reproduce an example here as the data set I am 
> using in very large. I can give you an example:
>
> fit3.1<- coxph(formula = y ~ sex + ns(ageyrs, df = 2) + AdmissionSource +
> +     X1 + X2 + X3 + X5 + X6 + X7 + X11 + X12 + X13 + X14 + X15 +
> +     X16 + X17 + X18 + X19 + X20 + X22 + X24 + X25 + X26 + X27 +
> +     X28 + X29 + X32 + X33 + X35 + X38 + X39 + X40 + X41 + X42 +
> +     X43 + X44 + X47 + X49 + X53 + X54 + X55 + X58 + X59 + X62 +
> +     X68 + X69 + X78 + X80 + X81 + X84 + X85 + X86 + X93 + X95 +
> +     X98 + X100 + X101 + X102 + X105 + X107 + X108 + X109 + X110 +
> +     X112 + X113 + X114 + X115 + X116 + X117 + X121 + X122 + X125 +
> +     X127 + X128 + X129 + X131 + X132 + X133 + X134 + X138 + X140 +
> +     X143 + X145 + X146 + X148 + X150 + X151 + X153 + X157 + X158 +
> +     X159 + X164 + X197 + X200 + X202 + X203 + X204 + X205 + X211 +
> +     X214 + X217 + X224 + X228 + X233 + X237 + X244 + X249 + X254 +
> +     X258 + X259 + X260 + CharlsonIndex + ethnic + day + season +
> +     ln, data = dat2)
>
> haz<-basehaz(fit3.1) # gives 507 unique haz$time, time points
>
> fit2<-coxph(y~ns(ageyrs,df=2)+day+ln+sex+AdmissionSource+season+CharlsonIndex,data=dat1)
>
> haz<-basehaz(fit2) # gives 611 unique haz$time, time points
>
>
> I get the following warnings() with fit3.1:
> Warning message:
> In fitter(X, Y, strats, offset, init, control, weights = weights,  :
>   Loglik converged before variable   ; beta may be infinite.
>
> Also the coefficients of the variables that the error occurs for are 
> very high. The Wald test suggests dropping these terms where as the 
> LRT suggests keeping them. What should I do in terms of model selection?
>
>
>
>
>
>
>
>
> On Thu, Aug 9, 2012 at 2:00 PM, Terry Therneau <thern...@mayo.edu 
> <mailto:thern...@mayo.edu>> wrote:
>
>     I've never seen this, and have no idea how to reproduce it.
>     For resloution you are going to have to give me a working example
>     of the failure.
>
>     Also, per the posting guide, what is your sessionInfo()?
>
>     Terry Therneau
>
>     On 08/09/2012 04:11 AM, r-help-requ...@r-project.org
>     <mailto:r-help-requ...@r-project.org> wrote:
>
>         I have a couple of questions with regards to fitting a coxph
>         model to a data
>         set in R:
>
>         I have a very large dataset and wanted to get the baseline
>         hazard using the
>         basehaz() function in the package : 'survival'.
>         If I use all the covariates then the output from basehaz(fit),
>         where fit is
>         a model fit using coxph(), gives 507 unique values for the
>         time and the
>         corresponding cumulative hazard function. However if I use a
>         subset of the
>         varaibles, basehaz() gives 611 values for the time and
>         cumulative hazard.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to