I was going to suggest > AB <- df[c("A","B")] > ls2 <- array(split(df$C, AB), dim=sapply(AB, nlevels), dimnames=sapply(AB, levels)) which produces a matrix very similar to what Duncan's by() call produces > ls1 <- by(df$C, df[,1:2], identity) E.g., > ls2[["a","X"]] [1] 1 2 > ls1[["a","X"]] [1] 1 2 > ls1[["a","Y"]] # by assigns NULL to unoccupied slots NULL > ls2[["a","Y"]] # split gives the same type to all slots, copied from input numeric(0)
They both are quick because they use split() to avoid the repeated evaluations of bigVector[ anotherBigVector == scalar ] that your nested (not imbricated) loops do. If you really need to convert the matrix to a list of lists that will probably be a quick transformation. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Duncan Murdoch > Sent: Wednesday, August 10, 2011 9:43 AM > To: Frederic F > Cc: r-help@r-project.org > Subject: Re: [R] How to quickly convert a data.frame into a structure of lists > > On 10/08/2011 10:30 AM, Frederic F wrote: > > Hello Duncan, > > > > Here is a small example to illustrate what I am trying to do. > > > > # Example data.frame > > df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4)) > > # A B C > > # 1 a X 1 > > # 2 a X 2 > > # 3 b Y 3 > > # 4 b Z 4 > > > > ### First way of getting the list structure (ls1) using imbricated lapply > > loops: > > # Get the structure and populate it: > > ls1<-lapply(levels(df$A), function(levelA) { > > lapply(levels(df$B), function(levelB) {df$C[df$A==levelA& > > df$B==levelB]}) > > }) > > # Apply the names: > > names(list_structure)<-levels(df$A) > > for (i in 1:length(list_structure)) > > {names(list_structure[[i]])<-levels(df$B)} > > > > # Result: > > ls1$a$X > > # [1] 1 2 > > ls1$b$Z > > # [1] 4 > > > > The data.frame will always be 'complete', i.e., there will be a value in > > every row for every column. > > I want to produce a structure like this one quickly (I aim at something > > below 10 seconds) for a dataset containing between 1 and 2 millions of rows. > > > > I don't know what the timing would be like for your real data, but this > does look like by() would work: > > ls1 <- by(df$C, df[,1:2], identity) > > When I repeat the rows of df a million times each, this finishes in a > few seconds. It would definitely be slower if there were more levels of > A or B. > > Now ls1 will be a matrix whose entries are the subsets of C that you > want, so you can see your two results with slightly different syntax: > > > ls1[["a", "X"]] > [1] 1 2 > > ls1[["b","Z"]] > [1] 4 > > Duncan Murdoch > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.