On 2/03/2010, at 9:02 PM, Noah Silverman wrote:

> Hi,
> 
> I'm just learning about poison links for the glm function.
> 
> One of the data sets I'm playing with has several of the variables as 
> factors (i.e. month, group, etc.)
> 
> When I call the glm function with a formula that has a factor variable, 
> R automatically converts the variable to a series of variables with 
> unique names and binary values.
> 
> For example, with this pseudo data:
> 
> y        v1        month
> 2        1            january
> 3        1.4        februrary
> 1.5    6.3        february
> 1.2    4.5        january
> 5.5    4.0        march
> 
> I use this call:
> 
> m <- glm(y ~ v1 + month, family="poisson")
> 
> R gives me back a model with variables of
> Intercept
> v1
> monthJanuary
> monthFebruary
> monthMarch

        No it didn't!!!  You are kidding the troops/being economical with the 
truth.

        If you had used the data that you show, it would've ``given you a model 
with
        variables'':

        Intercept
        v1
        monthfebruray
        monthjanuary
        monthmarch

        No caps in the month name and note the miss-spelling of ``february''.

        You actually have ***four*** levels for the month factor:

                january februrary february march

        If you had spelt ``februrary'' correctly you would have got variables

        Intercept
        v1
        monthjanuary
        monthmarch

        The first level, february would have been omitted, under the default 
contrasts
        (contr.treatment).  You need k-1 dummy variables to specify a factor 
with k levels.

> I'm concerned that this might be doing some strange things to my model.

        No, you are doing strange things.

        Notice also that the Poisson distribution is a distribution of 
***counts***.
        Non-negative integers.  Whole numbers.  Values like 1.5 and 1.2 make no 
immediate
        sense in terms of the Poisson distribution.  The Poisson likelihood can 
be evaluated
        with non-integer responses, but the glm() function will quite rightly 
worry about
        non-integer values and give you a warning.  (Which you didn't mention.)

        If you really have non-integer valued responses you shouldn't be using 
the Poisson
        family; the quasi family *might* be appropriate --- if you know what 
you're doing.

> Can anyone offer some enlightenment?

        I hope you feel enlightened.

                cheers,

                        Rolf Turner
######################################################################
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by MailMarshal 
www.marshalsoftware.com
######################################################################

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to