Dear Help list, I am relatively new to the mgcv package, which I am using to model prices of housing transactions as a function of the characteristics of a home and a neighborhood. I have several smooth terms to capture price evolution over time, but also to non-parametrically fit the functional form of some characteristics such as living area, lot size etc. In my model I have neighborhood fixed effects (i.e. prices in different neighborhoods can have different means), but I would also like to allow for within neighborhood correlation in my errors. My question is: What is the best way to do this?
Sample size is approx. 14,000 obs. My model (without clustered residuals) looks something like (although I have many more regressors, several of which are factor variables): mod.1 <- gam(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) + factor(neighborhood), data=data, family=Gamma(link=log)) I was thinking that I could either include random effects at the neighborhood level (s(neighborhood, bs="re")) or I could use a GAMM with correlated errors within group: mod.2 <- gamm(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) + factor(neighborhood), correlation=corSymm(form~1|neighborhood), data=data, family=Gamma(link=log)) I tried out mod.1 with the random effects and it did provide larger s.e.'s as I would expect given positive correlation in the residuals. But it also seemed that the random effects component was not significant if I understand it correctly: the edf are very close to zero and the significance is NaN. Perhaps if this is the way to go, I should first demean the data at the neighborhood level? As for the gamm approach: I can't get it to work properly: It does not recognize my groups (i.e. it defines only one group). I tried to correct for this by transforming the neighborhood numbers into characters neighborhood.c <- (as.character) and then used this as the group indicator instead: corSymm(form~1|neighborhood.c) But this resulted in an error message: variable lengths differ (found for 'neighborhood.c')Â…The same happens when I write "factor(neighborhood)" in the corSymm specification. My panel is not balanced, i.e. the number of observations within neighborhoods varies. Is this a problem? I haven't seen any indication that the panel must be balanced to use lme, but maybe I've missed it? Any feedback would be much appreciated incl. suggestions on where I might read more about how to use mgcv for this type of problem. Thanks in advance! Kathrine [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.