Hi Justin, In data.table 1.6.1 there was this news item :
o j's environment is now consistently reused so that local variables may be set which persist from group to group; e.g., incrementing a group counter : DT[,list(z,groupInd<-groupInd+1),by=x] One of the reasons data.table is fast is that there is no function run per group. It's just that j expression. That's run in the same persistent environment for each group, so you can do things like increment a group counter within it. If your data were in 'long' format (data.table prefers long format, like a database) it might be something like (the ggplot line is untested) : ctr = 1 DT[,{ png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,width=11,units='in',pointsize=9,res=300) print(ggplot(aes(x=site,y=val))+geom_boxplot()+opts(title=paste('plot number',ctr,sep=' '))) dev.off() ctr<-ctr+1 }, by=site] Btw, there was a new feature in 1.6.3, where you can subassign into data.table 500 times faster than <-. See the NEWS from 1.6.3 for an example : http://datatable.r-forge.r-project.org/ Matthew "Justin Haynes" <jto...@gmail.com> wrote in message news:CAFaj53kjqy=1bJy+iLjeeLYKgvx=rte2h_ha24pt20wqvch...@mail.gmail.com... > Thanks Ista, > > In my real code that is exactly what I'm doing, but I want to prepend the > names with a sequential number for easier reference once the pngs are > made. > > My initial thought was to add the sequential number to the data before > sending it to plyr and drawing it out there, but that seems like an > excessive extra step when I have 1e6 - 1e7 rows. > > > Justin > > > On Wed, Aug 10, 2011 at 2:42 PM, Ista Zahn > <iz...@psych.rochester.edu>wrote: > >> Hi Justin, >> >> On Wed, Aug 10, 2011 at 5:04 PM, Justin Haynes <jto...@gmail.com> wrote: >> > If I have data: >> > >> > >> dat<-data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20),d=rnorm(20),site=rep(letters[5:8],each=5)) >> > >> > And want to plot like this: >> > >> > ctr<-1 >> > for(i in c('a','b','c','d')){ >> > png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5, >> > width=11,units='in',pointsize=9,res=300) >> > print(ggplot(dat[,names(dat) %in% >> > >> c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot >> > number',ctr,sep=' '))) >> > dev.off() >> > ctr<-ctr+1 >> > } >> > >> > Is there a way to do the same naming using plyr (or data.table or >> > foreach >> > which I am not familiar with at all!)? >> >> This is not "the same naming", but the same general idea can be >> achieved with plyr using >> >> d_ply(melt(dat,id.vars='site'),.(variable),function(df) { >> png(file=paste("plyr_plot", unique(df$variable), >> ".png"),height=8.5,width=11,units='in',pointsize=9,res=300) >> print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()) >> dev.off() >> }) >> >> I'm not up to speed on .parallel, foreach etc., so I'l leave the rest >> to someone else. >> >> Best, >> Ista >> > >> > m.dat<-melt(dat,id.vars='site') >> > ddply(m.dat,.(variable),function(df) >> > print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?) >> > >> > And better yet, is there a way to do it using .parallel=T? >> > >> > Faceting is not really an option (unless I can facet onto multiple >> > pages >> of >> > a pdf or something) because these need to go into reports as >> > individually >> > labelled and titled plots. >> > >> > >> > As a bit of a corollary, is it really worth the headache to resolve >> > this >> if >> > I am only using melt/plyr to split on the four letter variables? With a >> > larger set of data (1e6 rows), the melt/plyr version takes a >> > significant >> > amount of time but .parallel=T drops the time significantly. Is the >> right >> > answer a foreach loop and can I do that with the increasing counter? (I >> > haven't gotten beyond Hadley's .parallel feature in my parallel R >> > dealings.) >> > >> >> >> > >> dat<-data.frame(a=rnorm(1e6),b=rnorm(1e6),c=rnorm(1e6),d=rnorm(1e6),site=rep(letters[5:8],each=2.5e5)) >> >> ctr<-1 >> >> system.time(for(i in c('a','b','c','d')){ >> > + png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5, >> > width=11,units='in',pointsize=9,res=300) >> > + print(ggplot(dat[,names(dat) %in% >> > >> c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot >> > number',ctr,sep=' '))) >> > + dev.off() >> > + ctr<-ctr+1 >> > + }) >> > user system elapsed >> > 54.630 0.120 54.843 >> > >> >> system.time( >> > + ddply(melt(dat,id.vars='site'),.(variable),function(df) { >> > + >> > >> png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300) >> > + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()) >> > + dev.off() >> > + },.parallel=F) >> > + ) >> > user system elapsed >> > 58.40 0.13 58.63 >> > >> >> system.time( >> > + ddply(melt(dat,id.vars='site'),.(variable),function(df) { >> > + >> > >> png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300) >> > + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()) >> > + dev.off() >> > + },.parallel=T) >> > + ) >> > user system elapsed >> > 70.33 3.46 27.61 >> >> >> > >> > How might I speed this up and include the sequential plot names? >> > >> > Thanks a bunch! >> > >> > Justin >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Ista Zahn >> Graduate student >> University of Rochester >> Department of Clinical and Social Psychology >> http://yourpsyche.org >> > > [[alternative HTML version deleted]] > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.