Re: [R] mgcv: GAM with clustered standard errors

Simon Wood Thu, 11 Jul 2013 11:53:18 -0700

I think it's going to be a problem to have different sized groups in 
your second model. ?corSymm says that a general correlation matrix is 
being estimated (i.e. the correlation between each pair of observations 
is being estimated - for this to be meaningful across groups you need 
the jth price in one area to be somehow equivalent to the jth price in 
another area, which it probably isn't) - I can't figure out how this can 
be done if the groups are different sizes.


Even if your groups sizes were all the same, I guess you have lots of 
data per neighbourhood, so there will be an aweful lot of correlation 
parameters to estimate, and I doubt that it will be successful. Might it 
make more sense to start with something less parameter rich like 
corCompSymm (which would also be ok with different group sizes, I think)?

Finally I would just set data$neighborhood <- factor(data$neighborhood) 
for this. You need this, e.g.  to be sure that s(neighborhood,bs="re") 
is really doing what you want (i.e. giving a random coefficient for each 
neighbourhood, rather than a single random coefficient multiplying 
"neighborhood" interpreted as numeric). However if neighborhood is in as 
a factor, then s(neighborhood,bs="re") is adding nothing (you've 
effectively already included neighborhood as a random effect with 
infinite variance in the model, so including it again won't do anything 
interesting).

best,
Simon

On 11/07/13 15:46, Kathrine Veie wrote:
> Dear Help list,
>
> I am relatively new to the mgcv package, which I am using to model prices of 
> housing transactions as a function of the characteristics of a home and a 
> neighborhood. I have several smooth terms to capture price evolution over 
> time, but also to non-parametrically fit the functional form of some 
> characteristics such as living area, lot size etc. In my model I have 
> neighborhood fixed effects (i.e. prices in different neighborhoods can have 
> different means), but I would also like to allow for within neighborhood 
> correlation in my errors. My question is: What is the best way to do this?
>
> Sample size is approx. 14,000 obs.
>
> My model (without clustered residuals) looks something like (although I have 
> many more regressors, several of which are factor variables):
> mod.1 <- gam(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) + 
> factor(neighborhood), data=data, family=Gamma(link=log))
>
> I was thinking that I could either include random effects at the neighborhood 
> level (s(neighborhood, bs="re")) or I could use a GAMM with correlated errors 
> within group:
>
> mod.2 <- gamm(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) + 
> factor(neighborhood), correlation=corSymm(form~1|neighborhood), data=data, 
> family=Gamma(link=log))
>
> I tried out mod.1 with the random effects and it did provide larger s.e.'s as 
> I would expect given positive correlation in the residuals. But it also 
> seemed that the random effects component was not significant if I understand 
> it correctly: the edf are very close to zero and the significance is NaN. 
> Perhaps if this is the way to go, I should first demean the data at the 
> neighborhood level?
>
> As for the gamm approach: I can't get it to work properly: It does not 
> recognize my groups (i.e. it defines only one group). I tried to correct for 
> this by transforming the neighborhood numbers into characters
> neighborhood.c <-  (as.character)
> and then used this as the group indicator instead: 
> corSymm(form~1|neighborhood.c)
>
> But this resulted in an error message: variable lengths differ (found for 
> 'neighborhood.c')...The same happens when I write "factor(neighborhood)" in 
> the corSymm specification. My panel is not balanced, i.e. the number of 
> observations within neighborhoods varies.  Is this a problem? I haven't seen 
> any indication that the panel must be balanced to use lme, but maybe I've 
> missed it?
>
> Any feedback would be much appreciated incl. suggestions on where I might 
> read more about how to use mgcv for this type of problem.
>
> Thanks in advance!
> Kathrine
>
>       [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mgcv: GAM with clustered standard errors

Reply via email to