Dear Ellison,

thanks a lot for your reply. Your explanation makes things much clearer.
Sincerely,
f.


On 24 January 2013 05:58, S Ellison <s.elli...@lgcgroup.com> wrote:

>
>
> On 23 Jan 2013, at 21:36, "Francesco Sarracino" <f.sarrac...@gmail.com>
> wrote:
>
> > .... what I meant refers to the fact  that  I've read on "an R and
> > S-plus companion to applied regression" about methods to alter the
> encoding
> > of factors when using contrasts in regressions. These are options (for
> > contrasts) that can be easily set as "option('contrasts')". This command
> > changes the way R creates the dummies out of a factor and various methods
> > are available.
> > I was expecting that R might have had something similar that applied to
> my
> > case, thus changing the way R attaches numeric values to my dummy
> variable.
> > I am just surprised that such option doesn't exist. I was having wrong
> > expectations.
>
> Such options do exist, but at modelling time, not factor
> creation/conversion time.
>
> When created, by calls to 'factor' or in functions like 'read.table',
> factors are stored internally as integers with a list of labels (what you
> see as factor levels) that go with each integer. Those internal integers
> start at 1 and go up. You can set the ordering of those labels (by
> specifying the "levels" argument in factor()) so that, for example, yes and
> no can be associated with (numeric) factor levels 1 and 2 respectively
> instead of the default ordering which would put 'no' alphabetically before
> 'yes'. (I find this choice particularly useful for orderings like "high",
> "medium", "low" for which the alphabetic ordering is not exactly intuitive;
> similarly alphabetic ordering puts '1', '2', '10' in the order '1', '10',
> '2' and so on, so that often needs specifying manually. It's also useful to
> specify levels if you want things like boxplots to come out in a particular
> order, as boxplots by default use the order of the factor levels).
> The internal integer values are returned by 'as numeric'. If your factor
> level labels - which are always character - are also interpretable as
> numbers, you need 'as.character' to return the character strings and then
> 'as.numeric' to convert those.
>
> Now, up to this point you just have more or less arbitrary integers
> asociated with the original factor levels (the degree of arbitrariness
> depends on whether you specified the level order or let R use its default).
> These integers are not the contrasts used in model fitting. Contrasts are
> set at model matrix building time; they are not a fixed attribute of the
> factor. The internal numbering of levels  affects contrasts only to the
> extent that the numerical values used in setting contrasts are usually in
> the same order as the factor levels.  You can inspect the functions used to
> associate contrasts  with factor levels by using options("contrasts"). You
> can inspect the numerical values that would currently be used for a given
> factor with a call to contrasts(). You can change the contrast asignments
> globally using options() or explicitly in some model calls (lm, for
> example, has a contrasts argument) and if you like you can write your own
> contrast functions to set any values you like.  The most common are
> probably treatment contrasts, which set the first factor level as intercept
> and the rest as (unit) differences from that, and sum to zero contrasts
> which do what they say, setting contrasts that sum to zero by choosing a
> set like (-1, 0, 1).
>
> So you actually have a great deal of control over both the order in which
> labels are associated with factor levels and the (separate) values of
> contrasts associated with those factor levels at modelling time.
>
> The cost of that control is some complexity, and the time needed to learn
> what's going on to use it all properly.
>
> Hope that helps ...
>
>
> S Ellison
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:18}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to