I think it's going to be a problem to have different sized groups in your second model. ?corSymm says that a general correlation matrix is being estimated (i.e. the correlation between each pair of observations is being estimated - for this to be meaningful across groups you need the jth price in one area to be somehow equivalent to the jth price in another area, which it probably isn't) - I can't figure out how this can be done if the groups are different sizes.
Even if your groups sizes were all the same, I guess you have lots of data per neighbourhood, so there will be an aweful lot of correlation parameters to estimate, and I doubt that it will be successful. Might it make more sense to start with something less parameter rich like corCompSymm (which would also be ok with different group sizes, I think)? Finally I would just set data$neighborhood <- factor(data$neighborhood) for this. You need this, e.g. to be sure that s(neighborhood,bs="re") is really doing what you want (i.e. giving a random coefficient for each neighbourhood, rather than a single random coefficient multiplying "neighborhood" interpreted as numeric). However if neighborhood is in as a factor, then s(neighborhood,bs="re") is adding nothing (you've effectively already included neighborhood as a random effect with infinite variance in the model, so including it again won't do anything interesting). best, Simon On 11/07/13 15:46, Kathrine Veie wrote: > Dear Help list, > > I am relatively new to the mgcv package, which I am using to model prices of > housing transactions as a function of the characteristics of a home and a > neighborhood. I have several smooth terms to capture price evolution over > time, but also to non-parametrically fit the functional form of some > characteristics such as living area, lot size etc. In my model I have > neighborhood fixed effects (i.e. prices in different neighborhoods can have > different means), but I would also like to allow for within neighborhood > correlation in my errors. My question is: What is the best way to do this? > > Sample size is approx. 14,000 obs. > > My model (without clustered residuals) looks something like (although I have > many more regressors, several of which are factor variables): > mod.1 <- gam(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) + > factor(neighborhood), data=data, family=Gamma(link=log)) > > I was thinking that I could either include random effects at the neighborhood > level (s(neighborhood, bs="re")) or I could use a GAMM with correlated errors > within group: > > mod.2 <- gamm(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) + > factor(neighborhood), correlation=corSymm(form~1|neighborhood), data=data, > family=Gamma(link=log)) > > I tried out mod.1 with the random effects and it did provide larger s.e.'s as > I would expect given positive correlation in the residuals. But it also > seemed that the random effects component was not significant if I understand > it correctly: the edf are very close to zero and the significance is NaN. > Perhaps if this is the way to go, I should first demean the data at the > neighborhood level? > > As for the gamm approach: I can't get it to work properly: It does not > recognize my groups (i.e. it defines only one group). I tried to correct for > this by transforming the neighborhood numbers into characters > neighborhood.c <- (as.character) > and then used this as the group indicator instead: > corSymm(form~1|neighborhood.c) > > But this resulted in an error message: variable lengths differ (found for > 'neighborhood.c')...The same happens when I write "factor(neighborhood)" in > the corSymm specification. My panel is not balanced, i.e. the number of > observations within neighborhoods varies. Is this a problem? I haven't seen > any indication that the panel must be balanced to use lme, but maybe I've > missed it? > > Any feedback would be much appreciated incl. suggestions on where I might > read more about how to use mgcv for this type of problem. > > Thanks in advance! > Kathrine > > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.