On Sep 28, 2013, at 2:39 AM, E Joffe wrote:

Hi all,

I am using COX LASSO (glmnet / coxnet) regression to analyze a dataset of
394 obs. / 268 vars.
I use the following procedure:
1.      Construct a coxnet on the entire dataset (by cv.glmnet)
2.      Pick the significant features by selecting the non-zero coefficient
under the best lambda selected by the model
3.      Build a coxph model with bi-directional stepwise feature selection
limited to the coxnet selected features.

I was a bit puzzled by the third step. Once you had a reduced model from glmnet, what was the statistical basis for further elimination of variables?

(Quite apart from the statistical issues, I was rather surprised that this procedure even produced results since the 'step' function is not described in the 'stats' package as applying to 'coxph' model objects.)

        
To validate the model I use both Brier score (library=peperr) and Harrel's [Harrell]
C-Index (library=Hmisc) with a bootstrap of 50 iterations.


MY QUESTION : I am getting fairly good C-Index (0.76) and Brier (0.07) values for the models however per the coxnet the %Dev explained by the model
is at best 0.27 and when I plot the survfit of the coxph the plotted
confidence interval is very large.

How many events did you have? (The width of CI's is most importantly dependent on event counts and not particularly improved by a high case count. The power considerations are very similar to those of a binomial test.)


What am I missing here ?

Perhaps sufficient events? (You also seem to be missing a description of the study goals.)


--
David.


%DEV=27%



Brier score - 0.07  ($ipec.coxglmnet -> [1] 7.24)
C-Index - 0.76 ($cIndex -> [1] 0.763)



DATA: [Private Health Information - can't publish] 394 obs./268 vars.

CODE (need to define a dataset with 'time' and 'status' variables):

library("survival")
library ("glmnet")
library ("c060")
library ("peperr")
library ("Hmisc")

   #creat Y (survival matrix) for glmnet
   surv_obj <- Surv(dataset$time,dataset$status)


## tranform categorical variables into binary variables with dummy for
dataset
   predict_matrix <- model.matrix(~ ., data=dataset,
                                  contrasts.arg = lapply
(dataset[,sapply(dataset, is.factor)], contrasts))

## remove the statu/time variables from the predictor matrix (x) for
glmnet
   predict_matrix <- subset (predict_matrix, select=c(-time,-status))

   ## create a glmnet cox object using lasso regularization and cross
validation
   glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")

   ## get the glmnet model on the full dataset
   glmnet.obj <- glmnet.cv$glmnet.fit

   # find lambda index for the models with least partial likelihood
deviance (by cv.glmnet)
   optimal.lambda <- glmnet.cv$lambda.min    # For a more parsimoneous
model use lambda.1se
   lambda.index <- which(glmnet.obj$lambda==optimal.lambda)


   # take beta for optimal lambda
   optimal.beta  <- glmnet.obj$beta[,lambda.index]

   # find non zero beta coef
   nonzero.coef <- abs(optimal.beta)>0
   selectedBeta <- optimal.beta[nonzero.coef]

   # take only covariates for which beta is not zero
   selectedVar   <- predict_matrix[,nonzero.coef]

   # create a dataframe for trainSet with time, status and selected
variables in binary representation for evaluation in pec
   reformat_dataSet <- as.data.frame(cbind(surv_obj,selectedVar))

   # glmnet.cox only with meaningful features selected by stepwise
bidirectional AIC feature selection
   glmnet.cox.meaningful <- step(coxph(Surv(time,status) ~
.,data=reformat_dataSet),direction="both")




##--------------------------------------------------------------------------
-----------------------------
   ##                                    MODEL PERFORMANCE

##--------------------------------------------------------------------------
-----------------------------
   ##


   ## Calculate the Brier score - pec does its own bootstrap so this
function runs on i=51 (i.e., whole trainset)

       ## Brier score calculation to cox-glmnet
       peperr.coxglmnet <- peperr(response=surv_obj, x=selectedVarCox,
                               fit.fun=fit.coxph, load.all=TRUE,
indices=resample.indices(n=nrow(surv_obj),
method="boot", sample.n=50))

       # Get error predictions from peperr
       prederr.coxglmnet <- perr(peperr.coxglmnet)

       # Integrated prediction error Brier score calculation
       ipec.coxglmnet<-ipec(prederr.coxglmnet,
eval.times=peperr.coxglmnet$attribute, response=surv_obj)


 ## C-Index calculation 50 iter bootstrapping

 for (i in 1:50){
       print (paste("Iteration:",i))
train <- sample(1:nrow(dataset), nrow(dataset), replace = TRUE) ##
random sampling with replacement
# create a dataframe for trainSet with time, status and selected
variables in binary representation for evaluation in pec
       reformat_trainSet <- reformat_dataSet [train,]


       # glmnet.cox only with meaningful features selected by stepwise
bidirectional AIC feature selection
       glmnet.cox.meaningful.test <- step(coxph(Surv(time,status) ~
.,data=reformat_dataSet),direction="both")

       selectedVarCox   <-
predict_matrix[,attr(glmnet.cox.meaningful.test$terms,"term.labels")]
reformat_testSet <- as.data.frame(cbind(surv_obj,selectedVarCox))
       reformat_testSet <- reformat_dataSet [-train,]


#     compute c-index (Harrell's) for cox-glmnet models
       if (is.null(glmnet.cox.meaningful)){
         cIndexCoxglmnet <- c(cIndexCoxglmnet,NULL)
       }else{
         cIndexCoxglmnet <- c(cIndexCoxglmnet,
1-rcorr.cens(predict(glmnet.cox.meaningful,
reformat_testSet),Surv(reformat_testSet$time,reformat_testSet $status))[1])
       }
 }

 #Get average C-Index
 cIndex<- mean (unlist(cIndexCoxglmnet),rm.na=TRUE)

 #create a list of all the objects generated

assign (name,c(eval(parse(text=name)),glmnet.cv=list(glmnet.cv),glmnet.obj=li
st(glmnet.obj),

selectedVar=list(colnames(selectedVar)),glmnet.cox=list(glmnet.cox),

glmnet .cox.meaningful=list(glmnet.cox.meaningful),ipec.coxglmnet=list(ipec.c
oxglmnet),
               cIndex=cIndex))

 # save image of the workspace after each iteration
   save.image("final_subgroup_analysissubgroup_analysis.RData")


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to