Re: [R] speeding up regressions using ddply

Greg Snow Wed, 22 Sep 2010 13:08:03 -0700

Why do you want to do this?

If there is just a small part of the logistic regression that you are 
interested in, then there may be a way to compute or approximate that more 
quickly than doing a full glm fit on every pair.  It seems unlikely that you 
would get much meaning out of that many full regressions, but there may be some 
piece that you are looking for that getting just that could lend itself to 
further graphing/analysis.


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Alison Macalady
> Sent: Wednesday, September 22, 2010 5:05 AM
> To: r-help@r-project.org
> Subject: [R] speeding up regressions using ddply
> 
> 
> 
> Hi,
> 
> I have a data set that I'd like to run logistic regressions on, using
> ddply to speed up the computation of many models with different
> combinations of variables.  I would like to run regressions on every
> unique two-variable combination in a portion of my data set,  but I
> can't quite figure out how to do using ddply.  The data set looks like
> this, with "status" as the binary dependent variable and V1:V8 as
> potential independent variables in the logistic regression:
> 
> m <- matrix(rnorm(288), nrow = 36)
> colnames(m) <- paste('V', 1:8, sep = '')
> x <- data.frame( status = factor(rep(rep(c('D','L'), each = 6), 3)),
>                 as.data.frame(m))
> 
> I used melt to put my data frame into a more workable format
> require(reshape)
> xm <- melt(x, id = 'status')
> 
> Here is the basic shape of the function I'd like to apply to every
> combination of variables in the dataset:
> 
> h<- function(df)
> {
> 
> attach(df)
> log.glm <- (glm(status ~ value1+ value2 , family=binomial(link=logit),
> na.action=na.omit)) #What I can't figure out is how to specify 2
> different variables (I've put value1 and value2 as placeholders) from
> the xm to include in the model
> 
> glm.summary<-summary(log.glm)
> aic <- extractAIC(log.glm)
> coef <- coef(glm.summary)
> list(Est1=coef[1,2], Est2=coef[3,2],  AIC=aic[2]) #or whatever other
> output here
> }
> 
> And then I'd like to use ddply to speed up the computations.
> 
> require(pplyr)
> output<-dddply(xm, .(variable), as.data.frame.function(h))
> output
> 
> 
> I can easily do this using ddply when I only want to use 1 variable in
> the model, but can't figure out how to do it with two variables.
> 
> Many thanks for any hints!
> 
> Ali
> 
> 
> 
> --------------------
> Alison Macalady
> Ph.D. Candidate
> University of Arizona
> School of Geography and Development
> & Laboratory of Tree Ring Research
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] speeding up regressions using ddply

Reply via email to