On Wed, Dec 28, 2011 at 8:14 AM, Simon Urbanek <simon.urba...@r-project.org> wrote: > Hadley, > > there was a whole discussion about subsetting and subassigning data frames > (and general efficiency issues) some time ago (I can't find it in a hurry but > others might)
Yep, a rather lengthy discussion at that http://r.789695.n4.nabble.com/speeding-up-perception-td3640920.html. IIRC, there was also some off list stuff about what it would take to push to C, which I may have in my inbox if anyone wants. Cheers, Josh -- just look at the `[.data.frame` code to see why it's so slow. It would need to be pushed into C code to allow certain optimizations, but it's a quite complex code so I don't think there were volunteers. So the advice is don't do it ;). Treating DFs as lists is always faster since you get to the fast internal code. > > Cheers, > S > > > On Dec 28, 2011, at 10:37 AM, Hadley Wickham wrote: > >> Hi all, >> >> There seems to be rather a large speed disparity in subsetting when >> working with a whole data frame vs. working with just columns >> individually: >> >> df <- as.data.frame(replicate(10, runif(1e5))) >> ord <- order(df[[1]]) >> >> system.time(df[ord, ]) >> # user system elapsed >> # 0.043 0.007 0.059 >> system.time(lapply(df, function(x) x[ord])) >> # user system elapsed >> # 0.022 0.008 0.029 >> >> What's going on? >> >> I realise this isn't quite a fair example because the second case >> makes a list not a data frame, but I thought it would be quick >> operation to turn a list into a data frame if you don't do any >> checking: >> >> list_to_df <- function(list) { >> n <- length(list[[1]]) >> structure(list, >> class = "data.frame", >> row.names = c(NA, -n)) >> } >> system.time(list_to_df(lapply(df, function(x) x[ord]))) >> # user system elapsed >> # 0.031 0.017 0.048 >> >> So I guess this is slow because it has to make a copy of the whole >> data frame to modify the structure. But couldn't [.data.frame avoid >> that? >> >> Hadley >> >> >> -- >> Assistant Professor / Dobelman Family Junior Chair >> Department of Statistics / Rice University >> http://had.co.nz/ >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel