Dear Help list,

I am relatively new to the mgcv package, which I am using to model prices of 
housing transactions as a function of the characteristics of a home and a 
neighborhood. I have several smooth terms to capture price evolution over time, 
but also to non-parametrically fit the functional form of some characteristics 
such as living area, lot size etc. In my model I have neighborhood fixed 
effects (i.e. prices in different neighborhoods can have different means), but 
I would also like to allow for within neighborhood correlation in my errors. My 
question is: What is the best way to do this?

Sample size is approx. 14,000 obs. 

My model (without clustered residuals) looks something like (although I have 
many more regressors, several of which are factor variables):
mod.1 <- gam(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) + 
factor(neighborhood), data=data, family=Gamma(link=log))

I was thinking that I could either include random effects at the neighborhood 
level (s(neighborhood, bs="re")) or I could use a GAMM with correlated errors 
within group: 

mod.2 <- gamm(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) + 
factor(neighborhood), correlation=corSymm(form~1|neighborhood), data=data, 
family=Gamma(link=log))

I tried out mod.1 with the random effects and it did provide larger s.e.'s as I 
would expect given positive correlation in the residuals. But it also seemed 
that the random effects component was not significant if I understand it 
correctly: the edf are very close to zero and the significance is NaN. Perhaps 
if this is the way to go, I should first demean the data at the neighborhood 
level?

As for the gamm approach: I can't get it to work properly: It does not 
recognize my groups (i.e. it defines only one group). I tried to correct for 
this by transforming the neighborhood numbers into characters
neighborhood.c <-  (as.character) 
and then used this as the group indicator instead: 
corSymm(form~1|neighborhood.c)

But this resulted in an error message: variable lengths differ (found for 
'neighborhood.c')Â…The same happens when I write "factor(neighborhood)" in the 
corSymm specification. My panel is not balanced, i.e. the number of 
observations within neighborhoods varies.  Is this a problem? I haven't seen 
any indication that the panel must be balanced to use lme, but maybe I've 
missed it?

Any feedback would be much appreciated incl. suggestions on where I might read 
more about how to use mgcv for this type of problem.

Thanks in advance!
Kathrine

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to