On Tue, 2010-03-02 at 00:58 -0800, Noah Silverman wrote: > Ted, > > Brilliant explanation (as usual) > > I'm back in school, just starting on a post-graduate degree in stats so > the help is really appreciated. > > Now, I have a slightly trickier question about the same model. > > I've seen more than one way to get "values" out of the glm model. > > i.e. If we're looking at the 10th item in the dataset: > note: "m" is the model > > fitted(m)[10] > predict(m,dataset[10,]) > > Give me different results. From my data, I get the following real results: > > predict(m,data[100,]) > 100 > 7.727999 > > fitted(m)[100] > 179 > 3956.637
I find that unlikely - why is one labelled 100 and the other 179, so perhaps something is wrong here? However, that said, those two calls *will* give you different results because with predict, we can have several types of predictions. see ?predict.glm and note that the default is for type = "link", i.e. top produce predictions on the scale of the linear predictor/link function, which then need the inverse of the link function applying to them. What does predict(m, data, type = "response")[100] and fitted(m)[100] yield? Do you have missing values etc in your data? G > > From my understanding, the exp of the prediction should be equal to the > fitted value. Here it is not. I don't understand why. Any insight? > > -N > > > > On 3/2/10 12:47 AM, (Ted Harding) wrote: > > On 02-Mar-10 08:02:27, Noah Silverman wrote: > > > >> Hi, > >> I'm just learning about poison links for the glm function. > >> > >> One of the data sets I'm playing with has several of the > >> variables as factors (i.e. month, group, etc.) > >> > >> When I call the glm function with a formula that has a factor > >> variable, R automatically converts the variable to a series of > >> variables with unique names and binary values. > >> > >> For example, with this pseudo data: > >> > >> y v1 month > >> 2 1 january > >> 3 1.4 februrary > >> 1.5 6.3 february > >> 1.2 4.5 january > >> 5.5 4.0 march > >> > >> I use this call: > >> > >> m<- glm(y ~ v1 + month, family="poisson") > >> > >> R gives me back a model with variables of > >> Intercept > >> v1 > >> monthJanuary > >> monthFebruary > >> monthMarch > >> > >> I'm concerned that this might be doing some strange things > >> to my model. > >> Can anyone offer some enlightenment? > >> Thanks! > >> > > The creation of auxiliary variables is the way to incorporate > > a factor variable into a model. These are usually called > > "dummy variables", and are essentially indicator variables. > > > > Your data above would correspond to variables I (for Intercept), > > J (for January), F (for February) and M (for March) in addition > > to the other variables y and v1 as below: > > > > y v1 I J F M # month > > 2 1 1 1 0 0 # january > > 3 1.4 1 0 1 0 # februrary > > 1.5 6.3 1 0 1 0 # february > > 1.2 4.5 1 1 0 0 # january > > 5.5 4.0 1 0 0 1 # march > > > > The linear predictor L in the model for y would then be > > > > L = a*I + b*v1 + c1*J + c2*F + c3*J > > > > evaluated arithmetically; e.g. for row 2 of the data it is > > > > a + b*1.4 + c2 > > > > However, as given, J + F + M = I, so there is redundancy in > > the variables, since there are only three independent values > > there (not so if you exclude the Intercept using a model > > formula y ~ v1 + month - 1), so R will provide estimates > > which are computed in terms of some pattern of differences > > between these four variables called contrasts. Different > > patterns of difference present different representations > > of the three independent aspects. > > > > There are many different kinds of contrasts available. > > One of these will be chosen as default by R (depending in > > particular on whether the factor variable is being used > > as an ordered factor or an unordered factor). See ?contrasts > > for an outline of what is there, ?contrast for more detail, > > and look at the help for particular contrasts such as > > ?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment. > > > > After all that: No, R is not doing strange things to your model! > > > > ted. > > > > -------------------------------------------------------------------- > > E-Mail: (Ted Harding)<ted.hard...@manchester.ac.uk> > > Fax-to-email: +44 (0)870 094 0861 > > Date: 02-Mar-10 Time: 08:47:11 > > ------------------------------ XFMail ------------------------------ > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.