Would be grateful for advice on gam/bam model selection incorporating random 
effects and autoregressive terms.

I have a multivariate time series recorded on ~500 subjects at ~100 time 
points.  One of the variables (A) is the dependent and four others (B to E) are 
predictors.  My basic formula is:

[model 1]: bam(A ~ s(time)+s(B)+s(C)+s(D)+s(E))

I've then included a random intercept and a random effect for time as the 
pattern of A over time is highly variable across subjects.

[model 2]: bam(A ~ s(time)+s(B)+s(C)+s(D)+s(E)+s(id, bs='re')+s(id,time, 
bs='re'))

I expect there is also potential for autocorrelation within the time series. So:

[model 3]: bam(A ~ s(time)+s(B)+s(C)+s(D)+s(E)+s(id, bs='re')+s(id,time, 
bs='re'), AR.start = startindex, rho = 0.52)

The rho value of 0.52 was settled on by trial-and-error minimising fREML/ML 
(side question: am I correct in understanding that bam can only use a fixed rho 
rather than taking this as a value to optimise as in gamm?)

The lowest fREML or ML values are obtained by model 3 (71674 vs 72099) for 
model 2) but the highest adjusted R2/deviance explained is with model 2 (37.7 
vs 42.1%).  Model 1 is inferior to both the others on all measures.

Is it better to select the model including the AR term given the lower ML or is 
it legitimate to go with the 'simpler' model 2 that has higher R2/deviance 
explained?

I am unable to provide a fully reproducible example as I don't know how to 
generate sample data with these specific characteristics.

Many thanks
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to