Dear All,
I have a large data frame ( 2700000 lines and 14 columns), and I would like to
extract the information in a particular way illustrated below:
Given a data frame "df":
> col1=sample(c(0,1),10, rep=T)
> names = factor(c(rep("A",5),rep("B",5)))
> df = data.frame(names,col1)
> df
names col1
1 A 1
2 A 0
3 A 1
4 A 0
5 A 1
6 B 0
7 B 0
8 B 1
9 B 0
10 B 0
I would like to tranform it in the form:
> index = c("A","B")
> col1[[1]]=df$col1[which(df$name=="A")]
> col1[[2]]=df$col1[which(df$name=="B")]
My problem is that the command: *** which(df$name=="A") ***
takes about 1 second because df is so big.
I was thinking that a "level" could maybe be accessed instantly but I am not
sure about how to do it.
I would be very grateful for any advice that would allow me to speed this up.
Best wishes,
Emmanuel
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.