On 10/08/2011 10:30 AM, Frederic F wrote:
Hello Duncan,

Here is a small example to illustrate what I am trying to do.

# Example data.frame
df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4))
#   A B C
# 1 a X 1
# 2 a X 2
# 3 b Y 3
# 4 b Z 4

### First way of getting the list structure (ls1) using imbricated lapply
loops:
# Get the structure and populate it:
ls1<-lapply(levels(df$A), function(levelA) {
       lapply(levels(df$B), function(levelB) {df$C[df$A==levelA&
df$B==levelB]})
})
# Apply the names:
names(list_structure)<-levels(df$A)
for (i in 1:length(list_structure))
{names(list_structure[[i]])<-levels(df$B)}

# Result:
ls1$a$X
# [1] 1 2
ls1$b$Z
# [1] 4

The data.frame will always be 'complete', i.e., there will be a value in
every row for every column.
I want to produce a structure like this one quickly (I aim at something
below 10 seconds) for a dataset containing between 1 and 2 millions of rows.


I don't know what the timing would be like for your real data, but this does look like by() would work:

ls1 <- by(df$C, df[,1:2], identity)

When I repeat the rows of df a million times each, this finishes in a few seconds. It would definitely be slower if there were more levels of A or B.

Now ls1 will be a matrix whose entries are the subsets of C that you want, so you can see your two results with slightly different syntax:

> ls1[["a", "X"]]
[1] 1 2
> ls1[["b","Z"]]
[1] 4

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to