There has been a recent addition of parallel processing capabilities
to plyr (I believe v1.2 and later), along with a dataframe iterator
construct. Both have improved performance of ddply greatly for
multicore/cluster computing. So we now have the niceness of plyr's
grammar with pretty good performance. From the plyr NEWS file:
Version 1.2 (2010-09-09)
------------------------------------------------------------------------------
NEW FEATURES
* l*ply, d*ply, a*ply and m*ply all gain a .parallel argument that when
TRUE,
applies functions in parallel using a parallel backend registered
with the
foreach package:
x <- seq_len(20)
wait <- function(i) Sys.sleep(0.1)
system.time(llply(x, wait))
# user system elapsed
# 0.007 0.005 2.005
library(doMC)
registerDoMC(2)
system.time(llply(x, wait, .parallel = TRUE))
# user system elapsed
# 0.020 0.011 1.038
On 9/22/10 10:41 AM, Ista Zahn wrote:
Hi Alison,
On Wed, Sep 22, 2010 at 11:05 AM, Alison Macalady<a...@kmhome.org> wrote:
Hi,
I have a data set that I'd like to run logistic regressions on, using ddply
to speed up the computation of many models with different combinations of
variables.
In my experience ddply is not particularly fast. I use it a lot
because it is flexible and has easy to understand syntax, not for it's
speed.
I would like to run regressions on every unique two-variable
combination in a portion of my data set, but I can't quite figure out how
to do using ddply.
I'm not sure ddply is the tool for this job.
The data set looks like this, with "status" as the
binary dependent variable and V1:V8 as potential independent variables in
the logistic regression:
m<- matrix(rnorm(288), nrow = 36)
colnames(m)<- paste('V', 1:8, sep = '')
x<- data.frame( status = factor(rep(rep(c('D','L'), each = 6), 3)),
as.data.frame(m))
You can use combn to determine the combinations you want:
Varcombos<- combn(names(x)[-1], 2)
> From there you can do a loop, something like
results<- list()
for(i in 1:dim(Varcombos)[2])
{
log.glm<- glm(as.formula(paste("status ~ ", Varcombos[1,i], " + ",
Varcombos[2,i], sep="")), family=binomial(link=logit),
na.action=na.omit, data=x)
glm.summary<-summary(log.glm)
aic<- extractAIC(log.glm)
coef<- coef(glm.summary)
results[[i]]<- list(Est1=coef[1,2], Est2=coef[3,2], AIC=aic[2])
#or whatever other output here
names(results)[i]<- paste(Varcombos[1,i], Varcombos[2,i], sep="_")
}
I'm sure you could replace the loop with something more elegant, but
I'm not really sure how to go about it.
I used melt to put my data frame into a more workable format
require(reshape)
xm<- melt(x, id = 'status')
Here is the basic shape of the function I'd like to apply to every
combination of variables in the dataset:
h<- function(df)
{
attach(df)
log.glm<- (glm(status ~ value1+ value2 , family=binomial(link=logit),
na.action=na.omit)) #What I can't figure out is how to specify 2 different
variables (I've put value1 and value2 as placeholders) from the xm to
include in the model
glm.summary<-summary(log.glm)
aic<- extractAIC(log.glm)
coef<- coef(glm.summary)
list(Est1=coef[1,2], Est2=coef[3,2], AIC=aic[2]) #or whatever other output
here
}
And then I'd like to use ddply to speed up the computations.
require(pplyr)
output<-dddply(xm, .(variable), as.data.frame.function(h))
output
I can easily do this using ddply when I only want to use 1 variable in the
model, but can't figure out how to do it with two variables.
I don't think this approach can work. You are saying "split up xm by
variable" and then expecting to be able to reference different levels
of variable within each split, an impossible request.
Hope this helps,
Ista
Many thanks for any hints!
Ali
--------------------
Alison Macalady
Ph.D. Candidate
University of Arizona
School of Geography and Development
& Laboratory of Tree Ring Research
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.