Dear Ellison, thanks a lot for your reply. Your explanation makes things much clearer. Sincerely, f.
On 24 January 2013 05:58, S Ellison <s.elli...@lgcgroup.com> wrote: > > > On 23 Jan 2013, at 21:36, "Francesco Sarracino" <f.sarrac...@gmail.com> > wrote: > > > .... what I meant refers to the fact that I've read on "an R and > > S-plus companion to applied regression" about methods to alter the > encoding > > of factors when using contrasts in regressions. These are options (for > > contrasts) that can be easily set as "option('contrasts')". This command > > changes the way R creates the dummies out of a factor and various methods > > are available. > > I was expecting that R might have had something similar that applied to > my > > case, thus changing the way R attaches numeric values to my dummy > variable. > > I am just surprised that such option doesn't exist. I was having wrong > > expectations. > > Such options do exist, but at modelling time, not factor > creation/conversion time. > > When created, by calls to 'factor' or in functions like 'read.table', > factors are stored internally as integers with a list of labels (what you > see as factor levels) that go with each integer. Those internal integers > start at 1 and go up. You can set the ordering of those labels (by > specifying the "levels" argument in factor()) so that, for example, yes and > no can be associated with (numeric) factor levels 1 and 2 respectively > instead of the default ordering which would put 'no' alphabetically before > 'yes'. (I find this choice particularly useful for orderings like "high", > "medium", "low" for which the alphabetic ordering is not exactly intuitive; > similarly alphabetic ordering puts '1', '2', '10' in the order '1', '10', > '2' and so on, so that often needs specifying manually. It's also useful to > specify levels if you want things like boxplots to come out in a particular > order, as boxplots by default use the order of the factor levels). > The internal integer values are returned by 'as numeric'. If your factor > level labels - which are always character - are also interpretable as > numbers, you need 'as.character' to return the character strings and then > 'as.numeric' to convert those. > > Now, up to this point you just have more or less arbitrary integers > asociated with the original factor levels (the degree of arbitrariness > depends on whether you specified the level order or let R use its default). > These integers are not the contrasts used in model fitting. Contrasts are > set at model matrix building time; they are not a fixed attribute of the > factor. The internal numbering of levels affects contrasts only to the > extent that the numerical values used in setting contrasts are usually in > the same order as the factor levels. You can inspect the functions used to > associate contrasts with factor levels by using options("contrasts"). You > can inspect the numerical values that would currently be used for a given > factor with a call to contrasts(). You can change the contrast asignments > globally using options() or explicitly in some model calls (lm, for > example, has a contrasts argument) and if you like you can write your own > contrast functions to set any values you like. The most common are > probably treatment contrasts, which set the first factor level as intercept > and the rest as (unit) differences from that, and sum to zero contrasts > which do what they say, setting contrasts that sum to zero by choosing a > set like (-1, 0, 1). > > So you actually have a great deal of control over both the order in which > labels are associated with factor levels and the (separate) values of > contrasts associated with those factor levels at modelling time. > > The cost of that control is some complexity, and the time needed to learn > what's going on to use it all properly. > > Hope that helps ... > > > S Ellison > > ******************************************************************* > This email and any attachments are confidential. Any u...{{dropped:18}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.