Re: [R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???

David Winsemius Sat, 28 Sep 2013 07:43:30 -0700


On Sep 28, 2013, at 2:39 AM, E Joffe wrote:

Hi all,

I am using COX LASSO (glmnet / coxnet) regression to analyze adataset of

394 obs. / 268 vars.
I use the following procedure:
1.      Construct a coxnet on the entire dataset (by cv.glmnet)
2.      Pick the significant features by selecting the non-zero coefficient
under the best lambda selected by the model
3.      Build a coxph model with bi-directional stepwise feature selection
limited to the coxnet selected features.

I was a bit puzzled by the third step. Once you had a reduced modelfrom glmnet, what was the statistical basis for further elimination ofvariables?

(Quite apart from the statistical issues, I was rather surprised thatthis procedure even produced results since the 'step' function is notdescribed in the 'stats' package as applying to 'coxph' model objects.)

        
To validate the model I use both Brier score (library=peperr) andHarrel's [Harrell]
C-Index (library=Hmisc) with a bootstrap of 50 iterations.
MY QUESTION : I am getting fairly good C-Index (0.76) and Brier(0.07)values for the models however per the coxnet the %Dev explained bythe model
is at best 0.27 and when I plot the survfit of the coxph the plotted
confidence interval is very large.

How many events did you have? (The width of CI's is most importantlydependent on event counts and not particularly improved by a high casecount. The power considerations are very similar to those of abinomial test.)

What am I missing here ?

Perhaps sufficient events? (You also seem to be missing a descriptionof the study goals.)



--
David.


%DEV=27%



Brier score - 0.07  ($ipec.coxglmnet -> [1] 7.24)
C-Index - 0.76 ($cIndex -> [1] 0.763)



DATA: [Private Health Information - can't publish] 394 obs./268 vars.

CODE (need to define a dataset with 'time' and 'status' variables):

library("survival")
library ("glmnet")
library ("c060")
library ("peperr")
library ("Hmisc")

   #creat Y (survival matrix) for glmnet
   surv_obj <- Surv(dataset$time,dataset$status)

## tranform categorical variables into binary variables withdummy for

dataset
   predict_matrix <- model.matrix(~ ., data=dataset,
                                  contrasts.arg = lapply
(dataset[,sapply(dataset, is.factor)], contrasts))

## remove the statu/time variables from the predictor matrix (x)for

glmnet
   predict_matrix <- subset (predict_matrix, select=c(-time,-status))

   ## create a glmnet cox object using lasso regularization and cross
validation
   glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")

   ## get the glmnet model on the full dataset
   glmnet.obj <- glmnet.cv$glmnet.fit

   # find lambda index for the models with least partial likelihood
deviance (by cv.glmnet)
   optimal.lambda <- glmnet.cv$lambda.min    # For a more parsimoneous
model use lambda.1se
   lambda.index <- which(glmnet.obj$lambda==optimal.lambda)


   # take beta for optimal lambda
   optimal.beta  <- glmnet.obj$beta[,lambda.index]

   # find non zero beta coef
   nonzero.coef <- abs(optimal.beta)>0
   selectedBeta <- optimal.beta[nonzero.coef]

   # take only covariates for which beta is not zero
   selectedVar   <- predict_matrix[,nonzero.coef]

   # create a dataframe for trainSet with time, status and selected
variables in binary representation for evaluation in pec
   reformat_dataSet <- as.data.frame(cbind(surv_obj,selectedVar))

   # glmnet.cox only with meaningful features selected by stepwise
bidirectional AIC feature selection
   glmnet.cox.meaningful <- step(coxph(Surv(time,status) ~
.,data=reformat_dataSet),direction="both")




##--------------------------------------------------------------------------
-----------------------------
   ##                                    MODEL PERFORMANCE

##--------------------------------------------------------------------------
-----------------------------
   ##


   ## Calculate the Brier score - pec does its own bootstrap so this
function runs on i=51 (i.e., whole trainset)

       ## Brier score calculation to cox-glmnet
       peperr.coxglmnet <- peperr(response=surv_obj, x=selectedVarCox,
                               fit.fun=fit.coxph, load.all=TRUE,

indices=resample.indices(n=nrow(surv_obj),

method="boot", sample.n=50))

       # Get error predictions from peperr
       prederr.coxglmnet <- perr(peperr.coxglmnet)

       # Integrated prediction error Brier score calculation
       ipec.coxglmnet<-ipec(prederr.coxglmnet,
eval.times=peperr.coxglmnet$attribute, response=surv_obj)


 ## C-Index calculation 50 iter bootstrapping

 for (i in 1:50){
       print (paste("Iteration:",i))

train <- sample(1:nrow(dataset), nrow(dataset), replace =TRUE) ##

random sampling with replacement

# create a dataframe for trainSet with time, status andselected

variables in binary representation for evaluation in pec
       reformat_trainSet <- reformat_dataSet [train,]


       # glmnet.cox only with meaningful features selected by stepwise
bidirectional AIC feature selection
       glmnet.cox.meaningful.test <- step(coxph(Surv(time,status) ~
.,data=reformat_dataSet),direction="both")

       selectedVarCox   <-
predict_matrix[,attr(glmnet.cox.meaningful.test$terms,"term.labels")]

reformat_testSet <-as.data.frame(cbind(surv_obj,selectedVarCox))

       reformat_testSet <- reformat_dataSet [-train,]


#     compute c-index (Harrell's) for cox-glmnet models
       if (is.null(glmnet.cox.meaningful)){
         cIndexCoxglmnet <- c(cIndexCoxglmnet,NULL)
       }else{
         cIndexCoxglmnet <- c(cIndexCoxglmnet,
1-rcorr.cens(predict(glmnet.cox.meaningful,

reformat_testSet),Surv(reformat_testSet$time,reformat_testSet$status))[1])

       }
 }

 #Get average C-Index
 cIndex<- mean (unlist(cIndexCoxglmnet),rm.na=TRUE)

 #create a list of all the objects generated

assign(name,c(eval(parse(text=name)),glmnet.cv=list(glmnet.cv),glmnet.obj=li

st(glmnet.obj),

selectedVar=list(colnames(selectedVar)),glmnet.cox=list(glmnet.cox),

glmnet.cox.meaningful=list(glmnet.cox.meaningful),ipec.coxglmnet=list(ipec.c

oxglmnet),
               cIndex=cIndex))

 # save image of the workspace after each iteration
   save.image("final_subgroup_analysissubgroup_analysis.RData")


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???

Reply via email to