I was surprised to find that df$a[1] is an order of magnitude faster than df[1,"a"]:
> df <- data.frame(a=1:10) > system.time(replicate(100000, df$a[3])) user system elapsed 0.36 0.00 0.36 > system.time(replicate(100000, df[3,"a"])) user system elapsed 4.09 0.00 4.09 A priori, I'd have thought that combining the row and column selections into a single operation would at worst be equally fast, at best would be faster by having fewer intermediate results and avoiding redundant operations. I thought this might be because df[,] builds a data frame before simplifying it to a vector, but with drop=F, it is even slower, so that doesn't seem to be the problem: > system.time(replicate(100000, df[3,"a",drop=FALSE])) user system elapsed 15.00 0.00 14.99 I then wondered if it might be because '[' allows multiple columns and handles rownames. Sure enough, '[[,]]', which allows only one column, and does not handle rownames, is almost 3x faster: > system.time(replicate(100000, df[[3,"a"]])) user system elapsed 1.48 0.00 1.48 ...but it is still 4x slower than $[]. The timings are not sensitive to the number of rows in df (except for the drop=FALSE case, which is much slower for large dfs). I will be avoiding [,] and [[,]] when I don't need their functionality, but I still wonder why they should be so much slower than $[]. -s R 2.13.1 on Windows 7, i7-860 (2.8GHz) 8GB RAM [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel