Re: [R] glmnet inclusion / exclusion of categorical variables

David Winsemius Fri, 09 Aug 2013 15:04:50 -0700

On Aug 9, 2013, at 12:52 PM, kevin.shaney wrote:

> Thanks!  I tried doing the type.multinomial="grouped" argument - but it 
> didn't work for me.  Maybe I did something wrong.  I thought I understood why 
> it didn't work because of sparse.model.matrix recoding variables (like below 
> to V12 & V13} makes GLMNET unable to tell that they actually came from the 
> same source categorical variable.  Has that option ever worked for you in a 
> similar situation?


I wondered after posting if using the sparse.model.matrix input could be 
getting in the way of whatever grouping behavior might be occuring (which is 
conducted behind the scenes in a non-exported function). I've never attempted 
using it, and only asked the question because you didn't specifically say that 
you had used it in the fashion described in help page.

-- 
David.

> 
> Thanks!
> Kevin
> 
> From: David Winsemius [via R] 
> [mailto:ml-node+s789695n4673463...@n4.nabble.com]
> Sent: Friday, August 09, 2013 3:14 PM
> To: Kevin Shaney
> Subject: Re: glmnet inclusion / exclusion of categorical variables
> 
> 
> On Aug 9, 2013, at 6:44 AM, Kevin Shaney wrote:
> 
>> 
>> Hello -
>> 
>> I have been using GLMNET of the following form to predict multinomial 
>> logistic / class dependent variables:
>> 
>> mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm,
>> family="multinomial",standardize=FALSE)
>> 
>> I am using both continuous and categorical variables as predictors, and am 
>> using sparse.model.matrix to code my x's into a matrix.  This is changing an 
>> example categorical variable whose original name / values is {V1 = "1" or 
>> "2" or "3"} into two recoded variables {V12= "1" or "0" and V13 = "1" or 
>> "0"}.
> 
> You set their penalty factors to be 0 to at least observe the case where 
> inclusion is performed. And setting the penallty factor for both to be small 
> would allow you to "honestly" use 0 as the estimated coefficient in such 
> cases where one was estimated and the other not.
> 
>> 
>> As i am cycling through different penalties, i would like to either have 
>> both recoded variables included or both excluded, but not one included - and
>> can't figure out how to make that work.   I tried changing the
>> "type.multinomial" option, as that looks like this option should do what i 
>> want, but can't get it to work (maybe the difference in recoded variable 
>> names is driving this).
> 
> Doesn't the 'family' argument, used to set what I think you are calling 
> 'type', just refer to the y argument, rather  than the predictors. You may 
> want:
> 
>   mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm, type.multinomial="grouped",
>                 family="multinomial",standardize=FALSE)
> 
>> 
>> To summarize, for categorical variables, i would like to hierarchically 
>> constrain inclusion / exclusion of recoded variables in the model - either 
>> all of the recoded variables from the same original categorical  variable 
>> are in, or all are out.
> 
> I do understand that I am possibly not directly answering your question, but 
> in some respect I wonder if it deserves an answer. I think it is meaningful 
> if some factor levels are "penalized-out" of models.
> 
> --
> David Winsemius
> Alameda, CA, USA
> 


David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glmnet inclusion / exclusion of categorical variables

Reply via email to