Emmanuel, On Tue, Aug 12, 2008 at 4:35 PM, Emmanuel Levy <[EMAIL PROTECTED]> wrote: > Dear All, > > I have a large data frame ( 2700000 lines and 14 columns), and I would like to > extract the information in a particular way illustrated below: > > > Given a data frame "df": > >> col1=sample(c(0,1),10, rep=T) >> names = factor(c(rep("A",5),rep("B",5))) >> df = data.frame(names,col1) >> df > names col1 > 1 A 1 > 2 A 0 > 3 A 1 > 4 A 0 > 5 A 1 > 6 B 0 > 7 B 0 > 8 B 1 > 9 B 0 > 10 B 0 > > I would like to tranform it in the form: > >> index = c("A","B") >> col1[[1]]=df$col1[which(df$name=="A")] >> col1[[2]]=df$col1[which(df$name=="B")]
I'm not sure I fully understand your problem, you example would not run for me. You could get a small speedup by omitting which(), you can subset by a logical vector also which give a small speedup. > n <- 2700000 > foo <- data.frame( + one = sample(c(0,1), n, rep = T), + two = factor(c(rep("A", n/2 ),rep("B", n/2 ))) + ) > system.time(out <- which(foo$two=="A")) user system elapsed 0.566 0.146 0.761 > system.time(out <- foo$two=="A") user system elapsed 0.429 0.075 0.588 You might also find use for unstack(), though I didn't see a speedup. > system.time(out <- unstack(foo)) user system elapsed 1.068 0.697 2.004 HTH Peter > My problem is that the command: *** which(df$name=="A") *** > takes about 1 second because df is so big. > > I was thinking that a "level" could maybe be accessed instantly but I am not > sure about how to do it. > > I would be very grateful for any advice that would allow me to speed this up. > > Best wishes, > > Emmanuel ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.