Dear Dennis, David, Jeff, and Denes, Thanks for your helps and comments. The simple one seems good enough for my works.
Best, Steve On Wed, Dec 17, 2014 at 5:46 AM, Dénes Tóth <toth.de...@ttk.mta.hu> wrote: > > Dear Jeff, > > On 12/17/2014 01:46 AM, Jeff Newmiller wrote: > >> You are chasing ghosts of performance past, Denes. >> > > In terms of memory efficiency, yes. In terms of CPU time, there can be > significant difference, see below. > > > The data.frame > >> function causes no problems, and if it is used then the OP would not >> need to presume they know the internal structure of the data frame. >> See below. (I am using R3.1.2.) >> >> a1 <- list(x = rnorm(1e6), y = rnorm(1e6)) >> a2 <- list(x = rnorm(1e6), y = rnorm(1e6)) >> a3 <- list(x = rnorm(1e6), y = rnorm(1e6)) >> >> # get names of the objects >> out_names <- ls(pattern="a[[:digit:]]$") >> >> # amount of memory allocated >> gc(reset=TRUE) >> >> # Explicitly call data frame >> out2 <- data.frame( a1=a1[["x"]], a2=a2[["x"]], a3=a3[["x"]] ) >> >> # No copying. >> gc() >> >> # Your suggested retreival method >> out3a <- lapply( lapply( out_names, get ), "[[", "x" ) >> names( out3a ) <- out_names >> # The "obvious" way to finish the job works fine. >> out3 <- do.call( data.frame, out3a ) >> > > BTW, the even more "obvious" as.data.frame() produces the same with an > even more intuitive interface. > > However, for lists with a larger number of elements the transformation to > a data.frame can be pretty slow. In the toy example, we created only a > three-element list. Let's increase it a little bit. > > --- > > # this is not even that large > datlen <- 1e2 > listlen <- 1e5 > > # create a toy list > mylist <- matrix(seq_len(datlen * listlen), > nrow = datlen, ncol = listlen) > mylist <- lapply(1:ncol(mylist), function(i) mylist[, i]) > names(mylist) <- paste0("V", seq_len(listlen)) > > > # define the more efficient function --- > # note that I put class(x) first so that setattr does not > # modify the attributes of the original input (see ?setattr, > # you have to be careful) > setAttrib <- function(x) { > class(x) <- "data.frame" > data.table::setattr(x, "row.names", seq_along(x[[1]])) > x > } > > # benchmarking > # (we do not need microbenchmark here, the differences are > # extremely large) - on my machine, 9.4 sec, 8.1 sec vs 0.15 sec > gc(reset=TRUE) > system.time(df1 <- do.call(data.frame, mylist)) > gc() > system.time(df2 <- as.data.frame(mylist)) > gc() > system.time(df3 <- setAttrib(mylist)) > gc() > > # check results > identical(df1, df2) > identical(df1, df3) > > ---- > > Of course for small datasets, one should use the built-in and safe > functions (either do.call or as.data.frame). BTW, for the original > three-element list, these are even faster than the workaround. > > All the best, > Denes > > > > > > >> # No copying... well, you do end up with a new list in out3, but the >> data itself doesn't get copied. >> gc() >> >> >> On Tue, 16 Dec 2014, D?nes T?th wrote: >> >> On 12/16/2014 06:06 PM, SH wrote: >>> >>>> Dear List, >>>> >>>> I hope this posting is not redundant. I have several list outputs >>>> with the >>>> same components. I ran a function with three different scenarios below >>>> (e.g., scen1, scen2, and scen3,...,scenN). I would like to extract the >>>> same components and group them as a data frame. For example, >>>> pop.inf.r1 <- scen1[['pop.inf.r']] >>>> pop.inf.r2 <- scen2[['pop.inf.r']] >>>> pop.inf.r3 <- scen3[['pop.inf.r']] >>>> ... >>>> pop.inf.rN<-scenN[['pop.inf.r']] >>>> new.df <- data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN) >>>> >>>> My final output would be 'new.df'. Could you help me how I can do that >>>> efficiently? >>>> >>> >>> If efficiency is of concern, do not use data.frame() but create a list >>> and add the required attributes with data.table::setattr (the setattr >>> function of the data.table package). (You can also consider creating a >>> data.table instead of a data.frame.) >>> >>> # some largish lists >>> a1 <- list(x = rnorm(1e6), y = rnorm(1e6)) >>> a2 <- list(x = rnorm(1e6), y = rnorm(1e6)) >>> a3 <- list(x = rnorm(1e6), y = rnorm(1e6)) >>> >>> # amount of memory allocated >>> gc(reset=TRUE) >>> >>> # get names of the objects >>> out_names <- ls(pattern="a[[:digit:]]$") >>> >>> # create a list >>> out <- lapply(lapply(out_names, get), "[[", "x") >>> >>> # note that no copying occured >>> gc() >>> >>> # decorate the list >>> data.table::setattr(out, "names", out_names) >>> data.table::setattr(out, "row.names", seq_along(out[[1]])) >>> class(out) <- "data.frame" >>> >>> # still no copy >>> gc() >>> >>> # output >>> head(out) >>> >>> >>> HTH, >>> Denes >>> >>> >>> >>>> Thanks in advance, >>>> >>>> Steve >>>> >>>> P.S.: Below are some examples of summary outputs. >>>> >>>> >>>> summary(scen1) >>>>> >>>> Length Class Mode >>>> aql 1 -none- numeric >>>> rql 1 -none- numeric >>>> alpha 1 -none- numeric >>>> beta 1 -none- numeric >>>> n.sim 1 -none- numeric >>>> N 1 -none- numeric >>>> n.sample 1 -none- numeric >>>> n.acc 1 -none- numeric >>>> lot.inf.r 1 -none- numeric >>>> pop.inf.n 2000 -none- list >>>> pop.inf.r 2000 -none- list >>>> pop.decision.t1 2000 -none- list >>>> pop.decision.t2 2000 -none- list >>>> sp.inf.n 2000 -none- list >>>> sp.inf.r 2000 -none- list >>>> sp.decision 2000 -none- list >>>> >>>>> summary(scen2) >>>>> >>>> Length Class Mode >>>> aql 1 -none- numeric >>>> rql 1 -none- numeric >>>> alpha 1 -none- numeric >>>> beta 1 -none- numeric >>>> n.sim 1 -none- numeric >>>> N 1 -none- numeric >>>> n.sample 1 -none- numeric >>>> n.acc 1 -none- numeric >>>> lot.inf.r 1 -none- numeric >>>> pop.inf.n 2000 -none- list >>>> pop.inf.r 2000 -none- list >>>> pop.decision.t1 2000 -none- list >>>> pop.decision.t2 2000 -none- list >>>> sp.inf.n 2000 -none- list >>>> sp.inf.r 2000 -none- list >>>> sp.decision 2000 -none- list >>>> >>>>> summary(scen3) >>>>> >>>> Length Class Mode >>>> aql 1 -none- numeric >>>> rql 1 -none- numeric >>>> alpha 1 -none- numeric >>>> beta 1 -none- numeric >>>> n.sim 1 -none- numeric >>>> N 1 -none- numeric >>>> n.sample 1 -none- numeric >>>> n.acc 1 -none- numeric >>>> lot.inf.r 1 -none- numeric >>>> pop.inf.n 2000 -none- list >>>> pop.inf.r 2000 -none- list >>>> pop.decision.t1 2000 -none- list >>>> pop.decision.t2 2000 -none- list >>>> sp.inf.n 2000 -none- list >>>> sp.inf.r 2000 -none- list >>>> sp.decision 2000 -none- list >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> ------------------------------------------------------------ >> --------------- >> Jeff Newmiller The ..... ..... Go >> Live... >> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live >> Go... >> Live: OO#.. Dead: OO#.. Playing >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >> rocks...1k >> ------------------------------------------------------------ >> --------------- >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.