On Tuesday 13 November 2007, Prof Brian Ripley wrote: > On Tue, 13 Nov 2007, Dylan Beaudette wrote: > > Hi, > > > > I have setup a simple logistic regression model with the glm() function, > > with the follow formula: > > > > y ~ a + b > > > > where: > > 'a' is a continuous variable stratified by > > the levels of 'b' > > > > > > Looking over the manual for model specification, it seems that > > coefficients for unordered factors are given 'against' the first level of > > that factor.
Thanks for the quick reply. > Only for the default coding. Indeed, I should have added that to my initial message. > > This makes for difficult interpretation when using factor 'b' as a > > stratifying model term. > > Really? You realize that you have not 'stratified' on 'b', which would > need the model to be a*b? What you have is a model with parallel linear > predictors for different levels of 'b', and if the coefficients are not > telling you what you want you should change the coding. I should have specified that interpretation was difficult, not because of the default behaviour, rather my limitations and the nature of the data. Perhaps an example would help. y ~ a + b 'a' is a continuous predictor (i.e. temperature) observed on the levels of 'b' (geology) The form of the model (or at least what I was hoping for) would account for the variation in 'y' as predicted by 'a', within each level of 'b' . Am I specifying this model incorrectly? > Much of what I am trying to get across is that you have a lot of choice as > to how you specify a model to R. There has to be a default, which is > chosen because it is often a good choice. It does rely on factors being > coded well: the 'base level' (to quote ?contr.treatment) needs to be > interpretable. And also bear in mind that the default choices of > statistical software in this area almost all differ (and R's differs from > S, GLIM, some ways to do this in SAS ...), so people's ideas of a 'good > choice' do differ. Understood. I was not implying a level of 'goodness', rather hoping to gain some insight into a (possibly) mis-coded model specification. > > > Setting up the model, minus the intercept term, gives me what appear to > > be more meaningful coefficients. However, I am not sure if I am > > interpreting the results from a linear model without an intercept term. > > Model predictions from both specifications (with and without an intercept > > term) are nearly identical (different by about 1E-16 in probability > > space). > > > > Are there any gotchas to look out for when removing the intercept term > > from such a model? > > It is just a different parametrization of the linear predictor. > Anything interpretable in terms of the predictions of the model will be > unchanged. That is the crux: the default coefficients of 'b' will be > log odds-ratios that are directly interpretable, and those in the > per-group coding will be log-odds for a zero value of 'a'. Does a zero > value of 'a' make sense? In the case of this experiment, a zero-level for 'a' does not make sense. Further thoughts welcomed. Cheers, Dylan > > Any guidance would be greatly appreciated. > > > > Cheers, -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.