[Rd] Speed difference between df$a[1] and df[1,"a"]

Stavros Macrakis Wed, 19 Oct 2011 14:36:53 -0700

I was surprised to find that df$a[1] is an order of magnitude faster than
df[1,"a"]:


> df <- data.frame(a=1:10)

> system.time(replicate(100000, df$a[3]))
   user  system elapsed
   0.36    0.00    0.36

> system.time(replicate(100000, df[3,"a"]))
   user  system elapsed
   4.09    0.00    4.09


A priori, I'd have thought that combining the row and column selections into
a single operation would at worst be equally fast, at best would be faster
by having fewer intermediate results and avoiding redundant operations.

I thought this might be because df[,] builds a data frame before simplifying
it to a vector, but with drop=F, it is even slower, so that doesn't seem to
be the problem:

> system.time(replicate(100000, df[3,"a",drop=FALSE]))
   user  system elapsed
  15.00    0.00   14.99


I then wondered if it might be because '[' allows multiple columns and
handles rownames. Sure enough, '[[,]]', which allows only one column, and
does not handle rownames, is almost 3x faster:

> system.time(replicate(100000, df[[3,"a"]]))
   user  system elapsed
   1.48    0.00    1.48


...but it is still 4x slower than $[].

The timings are not sensitive to the number of rows in df (except for the
drop=FALSE case, which is much slower for large dfs).  I will be avoiding
[,] and [[,]] when I don't need their functionality, but I still wonder why
they should be so much slower than $[].

            -s

R 2.13.1 on Windows 7, i7-860 (2.8GHz) 8GB RAM

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Speed difference between df$a[1] and df[1,"a"]

Reply via email to