Hi all.  This is more of a stats question, I suppose.

Let's say I have two separate simple regressions of weight on year from two
different datasets.  I want to combine the regressions so that I can come up
with a single equation for the total weight regressed on year.  In reality,
there is missing data, so I can't just sum the data across datasets and come
up with a regression on the summed data.  Below is a program to reproduce
what I am trying to figure out.
###
aslp=50
bslp=-50
sda=20
sdb=100
yrs=0:10
a= rnorm(11,100,sda)+aslp*yrs
b= rnorm (11,1000,sdb)+bslp*yrs
ma=lm(a~yrs)
mb=lm(b~yrs)
pra=predict(ma,data.frame(yrs=yrs),interval='confidence')
prb=predict(mb,data.frame(yrs=yrs),interval='confidence')

##combine the two regressions for a single equation with confidence
intervals
pr=pra+prb###it couldn't be this simple, could it?

#by hand
co=coef(ma)+coef(mb)
new.sigma=sqrt(summary(ma)$sigma^2+summary(mb)$sigma^2)
fit=co[1]+co[2]*yrs
lwr= fit - qt(.975,9)*sqrt( new.sigma^2 * ( (1/11) + (
(yrs-mean(yrs))^2)/sum((yrs-mean(yrs))^2) ) )#the df are probably wrong (the
9 in the qt upr= fit + qt(.975,9)*sqrt( new.sigma^2 * ( (1/11) + (
(yrs-mean(yrs))^2)/sum((yrs-mean(yrs))^2) ) )# statement)
I can't print the graph here, so here's code for it...
#graph
plot(a,ylim=c(0,1500))
points(b)
lines(ma$fit)
lines(mb$fit)
lines(pra[,'lwr'],lty=2);lines(pra[,'upr'],lty=2)
lines(prb[,'lwr'],lty=2);lines(prb[,'upr'],lty=2)
lines(pr[,'fit'],col='grey')
lines(pr[,'upr'],lty=2,col='grey')
lines(pr[,'lwr'],lty=2,col='grey')
lines(fit,col='grey',lty=3)
lines(lwr,col='grey',lty=3)
lines(upr,col='grey',lty=3)
legend(1,1500,lty=c(2,3),legend=c('adding using predict','adding by
hand'),col='grey')



As you can see, the plots of the two different methods for producing the
confidence intervals don't match.  What I suspect is wrong in my by-hand
formula is the degrees of freedom (and n (the 11)), but if anything I think
those would increase, which would make the confidence intervals even smaller
than plainly summing the confidence intervals provided by the predict
function at each x.

I'm just wondering how to do this correctly, and am not sure if either
approach is correct.  If anybody would like to weigh in, I would appreciate
it.  Please feel free to point out the obvious.

Thanks,
Patrick

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to