Wow great! Split was exactly what was needed. It takes about 1 second for the whole operation :D
Thanks again - I can't believe I never used this function in the past. All the best, Emmanuel 2008/8/13 Erik Iverson <[EMAIL PROTECTED]>: > I still don't understand what you are doing. Can you make a small example > that shows what you have and what you want? > > Is ?split what you are after? > > Emmanuel Levy wrote: >> >> Dear Peter and Henrik, >> >> Thanks for your replies - this helps speed up a bit, but I thought >> there would be something much faster. >> >> What I mean is that I thought that a particular value of a level >> could be accessed instantly, similarly to a "hash" key. >> >> Since I've got about 6000 levels in that data frame, it means that >> making a list L of the form >> L[[1]] = values of name "1" >> L[[2]] = values of name "2" >> L[[3]] = values of name "3" >> ... >> would take ~1hour. >> >> Best, >> >> Emmanuel >> >> >> >> >> 2008/8/12 Henrik Bengtsson <[EMAIL PROTECTED]>: >>> >>> To simplify: >>> >>> n <- 2.7e6; >>> x <- factor(c(rep("A", n/2), rep("B", n/2))); >>> >>> # Identify 'A':s >>> t1 <- system.time(res <- which(x == "A")); >>> >>> # To compare a factor to a string, the factor is in practice >>> # coerced to a character vector. >>> t2 <- system.time(res <- which(as.character(x) == "A")); >>> >>> # Interestingly enough, this seems to be faster (repeated many times) >>> # Don't know why. >>> print(t2/t1); >>> user system elapsed >>> 0.632653 1.600000 0.754717 >>> >>> # Avoid coercing the factor, but instead coerce the level compared to >>> t3 <- system.time(res <- which(x == match("A", levels(x)))); >>> >>> # ...but gives no speed up >>> print(t3/t1); >>> user system elapsed >>> 1.041667 1.000000 1.018182 >>> >>> # But coercing the factor to integers does >>> t4 <- system.time(res <- which(as.integer(x) == match("A", levels(x)))) >>> print(t4/t1); >>> user system elapsed >>> 0.4166667 0.0000000 0.3636364 >>> >>> So, the latter seems to be the fastest way to identify those elements. >>> >>> My $.02 >>> >>> /Henrik >>> >>> >>> On Tue, Aug 12, 2008 at 7:31 PM, Peter Cowan <[EMAIL PROTECTED]> wrote: >>>> >>>> Emmanuel, >>>> >>>> On Tue, Aug 12, 2008 at 4:35 PM, Emmanuel Levy <[EMAIL PROTECTED]> >>>> wrote: >>>>> >>>>> Dear All, >>>>> >>>>> I have a large data frame ( 2700000 lines and 14 columns), and I would >>>>> like to >>>>> extract the information in a particular way illustrated below: >>>>> >>>>> >>>>> Given a data frame "df": >>>>> >>>>>> col1=sample(c(0,1),10, rep=T) >>>>>> names = factor(c(rep("A",5),rep("B",5))) >>>>>> df = data.frame(names,col1) >>>>>> df >>>>> >>>>> names col1 >>>>> 1 A 1 >>>>> 2 A 0 >>>>> 3 A 1 >>>>> 4 A 0 >>>>> 5 A 1 >>>>> 6 B 0 >>>>> 7 B 0 >>>>> 8 B 1 >>>>> 9 B 0 >>>>> 10 B 0 >>>>> >>>>> I would like to tranform it in the form: >>>>> >>>>>> index = c("A","B") >>>>>> col1[[1]]=df$col1[which(df$name=="A")] >>>>>> col1[[2]]=df$col1[which(df$name=="B")] >>>> >>>> I'm not sure I fully understand your problem, you example would not run >>>> for me. >>>> >>>> You could get a small speedup by omitting which(), you can subset by a >>>> logical vector also which give a small speedup. >>>> >>>>> n <- 2700000 >>>>> foo <- data.frame( >>>> >>>> + one = sample(c(0,1), n, rep = T), >>>> + two = factor(c(rep("A", n/2 ),rep("B", n/2 ))) >>>> + ) >>>>> >>>>> system.time(out <- which(foo$two=="A")) >>>> >>>> user system elapsed >>>> 0.566 0.146 0.761 >>>>> >>>>> system.time(out <- foo$two=="A") >>>> >>>> user system elapsed >>>> 0.429 0.075 0.588 >>>> >>>> You might also find use for unstack(), though I didn't see a speedup. >>>>> >>>>> system.time(out <- unstack(foo)) >>>> >>>> user system elapsed >>>> 1.068 0.697 2.004 >>>> >>>> HTH >>>> >>>> Peter >>>> >>>>> My problem is that the command: *** which(df$name=="A") *** >>>>> takes about 1 second because df is so big. >>>>> >>>>> I was thinking that a "level" could maybe be accessed instantly but I >>>>> am not >>>>> sure about how to do it. >>>>> >>>>> I would be very grateful for any advice that would allow me to speed >>>>> this up. >>>>> >>>>> Best wishes, >>>>> >>>>> Emmanuel >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.