Hello,
I have a problem with creating an identity matrix for glmnet by using the
contrasts function.
I have a factor with 4 levels.
When I create dummy variables I think there should be n-1 variables (in this
case 3) - so that the contrasts would be against the baseline level.
This is also what is written in the help file for 'contrasts'.
The problem is that the function creates a matrix with n variables (i.e. the
same as the number of levels) and not n-1 (where I would have a baseline
level for comparison).
My questions are:
1. How can I create a matrix with n-1 dummy vars ? was I supposed to
define explicitly that I want contr.treatment (contrasts) ?
2. If it is not possible, how should I interpret the hazard ratios in
the Cox regression I am generating (I use glmnet for variable selection and
then generate a Cox regression) - That is, if I get an HR of 3 for the
variable 300mg what does it mean ? the hazard is 3 times higher of what ?
Here is some code to reproduce the issue:
# Create a 4 level example factor
trt <- factor( sample( c("PLACEBO", "300 MG", "600 MG", "1200 MG"),
100, replace=TRUE ) )
# Use contrasts to get the identity matrix of dummy variables to be used in
glmnet
trt2 <- contrasts (trt,contrasts=FALSE)
Results (as you can see all levels are represented in the identity matrix):
> levels (trt)
[1] "1200 MG" "300 MG" "600 MG" "PLACEBO"
> print (trt2)
1200 MG 300 MG 600 MG PLACEBO
1200 MG 1 0 0 0
300 MG 0 1 0 0
600 MG 0 0 1 0
PLACEBO 0 0 0 1
Thank you,
Erel
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.