On 10/08/2011 10:30 AM, Frederic F wrote:
Hello Duncan,
Here is a small example to illustrate what I am trying to do.
# Example data.frame
df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4))
# A B C
# 1 a X 1
# 2 a X 2
# 3 b Y 3
# 4 b Z 4
### First way of getting the list structure (ls1) using imbricated lapply
loops:
# Get the structure and populate it:
ls1<-lapply(levels(df$A), function(levelA) {
lapply(levels(df$B), function(levelB) {df$C[df$A==levelA&
df$B==levelB]})
})
# Apply the names:
names(list_structure)<-levels(df$A)
for (i in 1:length(list_structure))
{names(list_structure[[i]])<-levels(df$B)}
# Result:
ls1$a$X
# [1] 1 2
ls1$b$Z
# [1] 4
The data.frame will always be 'complete', i.e., there will be a value in
every row for every column.
I want to produce a structure like this one quickly (I aim at something
below 10 seconds) for a dataset containing between 1 and 2 millions of rows.
I don't know what the timing would be like for your real data, but this
does look like by() would work:
ls1 <- by(df$C, df[,1:2], identity)
When I repeat the rows of df a million times each, this finishes in a
few seconds. It would definitely be slower if there were more levels of
A or B.
Now ls1 will be a matrix whose entries are the subsets of C that you
want, so you can see your two results with slightly different syntax:
> ls1[["a", "X"]]
[1] 1 2
> ls1[["b","Z"]]
[1] 4
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.