On 25/08/2019 12:08 a.m., Cyclic Group Z_1 via R-devel wrote:
In R scripts (as opposed to packages), even in reproducible scripts, it seems 
fairly conventional to use the global workspace as a sort of main function, and 
thus R scripts often populate the global environment with many variables, which 
may be mutated. Although this makes sense given R has historically been used 
interactively and this practice is common for scripting languages, this appears 
to disagree with the software-engineering principle of avoiding a mutating 
global state. Although this is just a rule of thumb, in R scripts, the frequent 
use of global variables is much more pronounced than in other languages.

On the other hand, in Python, it is common to use a main function (through the `def 
main():` and  `if __name__ == "__main__":` idioms). This is mentioned both in 
the documentation as well as in the writing of Python's main creator. Although this is 
more beneficial in Python than in R because Python code is structured into modules, which 
serve as both scripts and packages, whereas R separates these conceptually, a similar 
practice of creating a main function would help avoid the issues from mutating global 
state common to other languages and facilitate maintainability, especially for longer 
scripts.

Although many great R texts (Advanced R, Art of R Programming, etc.) caution against 
assignment in a parent enclosure (e.g., using `<<-`, or `assign`), I have not 
seen many promote the use of a main function and avoiding mutating global variables 
from top level.

Would it be a good idea to promote use of main functions and limiting 
global-state mutation for longer scripts and dedicated applications (not 
one-off scripts)? Should these practices be mentioned in the standard 
documentation?

Lexical scoping means that all of the problems of global variables are available to writers who use main(). You could treat the evaluation frame of your main function exactly like the global workspace: define functions within it, read and modify local variables from those functions, etc.

The benefit of using main() if you avoid defining all the other functions within it is that other functions normally operate on their arguments with few side effects. You achieve this in R by putting those other functions in packages, and running those functions in short scripts. That's how I've always recommended large projects be organized. You don't want a long script for anything, and you don't want multiple source files unless they're in a package.

Duncan Murdoch


This question was motivated largely by this discussion on Reddit: 
https://www.reddit.com/r/rstats/comments/cp3kva/is_mutating_global_state_acceptable_in_r/
 . Apologies beforehand if any of these (partially subjective) assessments are 
in error.

Best,
CG

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to