Thanks very much for your reply Chuck, I have a quick follow up question. You mention putting the data into a 2x2x3 for log-linear model, however my lists have many more than 3 strata, actually thousands. I am trying to work out whether the proportions in list 1 tend to be equal to the proportions in list 2, in a kind of matched pairs proportional test. Is the log-linear approach possible with a 2x2x1000 table, for example? Or would it be better to pursue the glm route, using the surrogate Poisson model, as you suggested?
Best regards, Corry >-----Original Message----- >From: Charles C. Berry [mailto:[EMAIL PROTECTED] >Sent: 04 October 2007 21:47 >To: Corry Gellatly >Cc: r-help@r-project.org >Subject: Re: [R] comparing matched proportions using glm > >On Thu, 4 Oct 2007, Corry Gellatly wrote: > >> >> Dear R users, >> >> Is it possible to use a generalized linear model to do a binomial >> comparison of one list of proportions with a matched list of >> proportions to test for a difference? >> >> So, for example: >> >> list 1 list 2 >> >> a1 | b1 a2 | b2 >> >> 3 | 4 7 | 9 >> 6 | 7 5 | 1 >> 9 | 1 3 | 1 >> >> >> I want to compare list 1 with list 2 and the samples are matched. > > >Meaning that > > 3 4 7 9 > >are the _counts_ in one stratum of three in all? > >And you have an hypothesis that claims the proportions are >equal in each stratum?? > >The obvious candidate for that setup is a log-linear model for >the counts in a 2 by 2 by 3 table. > >See > > ?loglin > >and > > ?loglm (in MASS) > >and the refernces therein. > >You can do this type of work in glm() if you understand >surrogate Poisson models as outlined in > >McCullagh P. and Nelder, J. A. (1989) Generalized Linear >Models. London: >Chapman and Hall. > >HTH, > >Chuck > >> Obviously, I could add the columns and do a binomial test, i.e. >> prop.test(c(18,15),c(30,26)), however, I have a large >dataset so this >> would reduce the power of my analysis. I could compare the >ratios, i.e. >> a1/(a1+b1) compared to a2/(a2+b2) for the samples in each list, >> however, this does not account for the difference in sample sizes >> between samples in each list. >> >> I have tried a glm where I bind a2 and b2 as the y variable, i.e. >> y<-cbind(a2,b2) and also bind a1 and b1 as the x variable, i.e. >> y<-cbind(a1,b1) and run <-glm(y~x,binomial) >> >> I get this type of output: >> >> Call: >> glm(formula = y ~ x, family = binomial) >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> -3.20426 -0.72686 -0.01822 0.68320 4.05035 >> >> Coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) 0.178369 0.186421 0.957 0.339 >> xa1 0.008109 0.017430 0.465 0.642 >> xb1 -0.026666 0.018153 -1.469 0.142 >> >> (Dispersion parameter for binomial family taken to be 1) >> >> Null deviance: 565.14 on 467 degrees of freedom >> Residual deviance: 559.69 on 465 degrees of freedom >> AIC: 1883.3 >> >> Number of Fisher Scoring iterations: 3 >> >> >> Is this output meaningful? It seems that y is not compared directly >> with x, but rather compared with a1 and b1, which is not intended? >> >> I wonder if this is a suitable approach to the problem? I'll be very >> grateful for any advice or suggestions. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >Charles C. Berry (858) 534-2098 > Dept of >Family/Preventive Medicine >E mailto:[EMAIL PROTECTED] UC San Diego >http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San >Diego 92093-0901 > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.