Liaw, Andy wrote:
A colleague and I were trying to understand all the possible things one
can do with for loops in R, and found some surprises.  I think we've
done sufficient detective work to have a good guess as to what's going
on "underneath", but it would be nice to get some confirmation, and
better yet, perhaps documentation in the R-lang manual.  Basically, the
question is, how/what does R do with the loop index variable?  Below are
some examples:

I think it is documented in the ?Control topic that a copy of the seq argument (the 1:2 in your first example) is made at the beginning, and that altering var (your i) doesn't affect the loop. One other thing you didn't investigate is what is the value of an expression like

loopval <- for (i in 1:2) { i }

This sets loopval to 2, but in R-devel (2.10.0 to be) this has changed: loops now have NULL as their value.


R> for (i in 1:2) { i <- 17; print(i) }
[1] 17
[1] 17
R> print(i)
[1] 17
R> x <- 1:2
R> for (i in x) { print(i); rm(i) }
[1] 1
[1] 2
R> i
Error: object 'i' not found
R> for (i in x) { print(i); rm(x) }
[1] 1
[1] 2
Warning message:
In rm(x) : object 'x' not found
R> i
[1] 2
R> x <- 1:2
R> for (i in x) { print(i); i <- 17; print(i) }
[1] 1
[1] 17
[1] 2
[1] 17

The guess is that at the beginning for the loop, R makes a copy of the
object that's being looped over ("x" in examples above) somewhere "under
cover", and at the beginning of each iteration, assign the "current"
element to the index variable ("I" in the examples above).  This is the
only logical explanation I can come up with given the behavior observed
above.  Can anyone confirm/deny this?  If this is true, one thing to
consider is not to use a large object to loop over (e.g., columns of a
very large data frame).

It is uncommon to modify seq (your x) in the loop. In the usual case where you don't modify it, the fact that the loop has made a copy should not matter: R won't actually copy the complete object until one version of it is changed.

So this sequence

seq <- data.frame(a=1:1000000, b=1:1000000)
for (var in seq) { print(var[1]) }

hardly uses any more memory during the loop than it used in creating seq, but this sequence

for (var in seq) { seq$b[1] <- -1; print(var[1]) }

uses a lot more: seq is modified so a copy is made, and seq$b is modified after var is set to it, so a copy is made of that too. Both of the loops print two 1's, by the way.

Duncan Murdoch
Andy

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to