Re: [Rd] Conventions: Use of globals and main functions

2019-08-27 Thread Martin Maechler
> Duncan Murdoch 
> on Mon, 26 Aug 2019 14:19:36 -0400 writes:

> On 26/08/2019 1:58 p.m., William Dunlap wrote:
>> Duncan Murdoch wrote:
>> > Scripts are for throwaways, not for anything worth keeping.
>> 
>> I totally agree and have a tangentially relevant question about the <<- 
>> operator.  Currently 'name <<- value' means to look up the environment 
>> stack until you find 'name'  and (a) if you find 'name' in some frame 
>> bind it to a new value in that frame and (b) if you do not find it make 
>> a new entry for it in .GlobalEnv.
>> 
>> Should R deprecate the second part of that and give an error if 'name' 
>> is not already present in the environment stack?  This would catch 
>> misspelling errors in functions that collect results from recursive 
>> calls.  E.g.,

> I like that suggestion.  Package tests have been complaining about 
> packages writing to .GlobalEnv for a while now, so there probably aren't 
> many instances of b) in CRAN packages; that change might be relatively 
> painless.

> Duncan Murdoch

I don't agree currently : AFAICS, there's no other case (in S or) R where an
assignment only works if there's no object with that name.

In addition: If I wanted such a functionality I'd rather have with a
function that has several arguments and this behavior was
switchable via   = TRUE/FALSE , rather than with
`<<-` which has always exactly 2 arguments.

[This is my personal opinion only; other R Core members may well
 think differently about this]

Martin

>> collectStrings <- function(list) {
>>     strings <- character() # to be populated by .collect
>>     .collect <- function(x) {
>>         if (is.list(x)) {
>>             lapply(x, .collect)
>>         } else if (is.character(x)) {
>>             strings <<- c(strings, x)
>>         }
>>         misspelledStrings <<- c(strings, names(x)) # oops, would like 
>> to be told about this error
>>         NULL
>>     }
>>     .collect(list)
>>     strings
>> }
>> 
>> This gives the incorrect:
>> > collectStrings(list(i="One", ii=list(a=1, b="Two")))
>> [1] "One" "Two"
>> > misspelledStrings
>> [1] "One" "Two" "i"   "ii"
>> 
>> instead of what we would get if 'misspelledStrings' were 'strings'.
>> > collectStrings(list(i="One", ii=list(a=1, b="Two")))
>> [1] "One" "Two" "a"   "b"   "i"   "ii"
>> 
>> If someone really wanted to assign into .GlobalEnv the assign() function 
>> is available.
>> 
>> In S '<<-' only had meaning (b) and R added meaning (a).  Perhaps it is 
>> time to drop meaning (b).  We could start by triggering a warning about 
>> it if some environment variable were set, as is being done for 
>> non-scalar && and ||.
>> 
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com 
>> 
>> 
>> On Sun, Aug 25, 2019 at 5:09 PM Duncan Murdoch > > wrote:
>> 
>> On 25/08/2019 7:09 p.m., Cyclic Group Z_1 wrote:
>> >
>> >
>> > This is a fair point; structuring functions into packages is
>> probably ultimately the gold standard for code organization in R.
>> However, lexical scoping in R is really not much different than in
>> other languages, such as Python, in which use of main functions and
>> defining other named functions outside of main are encouraged. For
>> example, in Scheme, from which R derives its scoping rules, the
>> community generally organizes code with almost exclusively functions
>> and few non-function global variables at top level. The common use
>> of globals in R seems to be mostly a consequence of historical
>> interactive use and, relatedly, an inherited practice from S.
>> >
>> > It is true, though, that since anonymous functions (such as in
>> lapply) play a large part in idiomatic R code, as you put it,
>> "[l]exical scoping means that all of the problems of global
>> variables are available to writers who use main()." Nevertheless,
>> using a main function with other functions defined outside it seems
>> like a good quick alternative that offers similar advantages to
>> making a package when functions are tightly coupled to the script
>> and the project may not be large or generalizable enough to warrant
>> making a package.
>> >
>> 
>> I think the idea that making a package is too hard is just wrong.
>> Packages in R have lots of requirements, but nowadays there are tools
>> that make them easy.  Eleven years ago at UseR in Dortmund I wrote a
>> package during a 45 minute presentation, and things are much easier now.
>> 
>> If you make a complex project without putting most of the code into a
>> package, you don't have something that you will be able to modify in a
>> year or two, beca

Re: [Rd] Conventions: Use of globals and main functions

2019-08-27 Thread Peter Meissner
Hey,

I always found it a strength of R compared to many other langaugas that
simple things (running a script, doing something interactive, writing a
function, using lambdas, installing packages, getting help, ...) are very
very simple.

R is a commandline statistics program that happens to be a very elegant,
simple and consistent programming language too.

That beeing said I think the main task of scripts is to get things done via
running them end to end in a fresh session. Now, it very well may happen
that a lot of stuff has to be done. Than splitting up scripts into
subscripts and sourcing them from a meta script is a straightforward
solution. It might also be that some functionality is put into functions to
be reused in other places. This can be done by putting those function
definitions into separate files. Than one cane use source wherever those
functions are needed. Now, putting stuff that runs code and scripts that
define/provovide functions into the same script is a bad idea. Using the
main()-idioms described might prevent this the problems stemming from
mixing functions and function execution. But it would also encourage this
mixing which is - I think, a bad idea anyways.

Therefore, I am against fostering a main()-idiom - it adds complexity and
encourages bad code structuring (putting application code and function
definition code into one file).

If one needs code to behave differenlty in interactive sessions than in
non-interactive sessions - if( interactive() ){ } is one way to solve this.

If more solid software developement is needed packages are the way to go.


Best, Peter


Am So., 25. Aug. 2019 um 06:11 Uhr schrieb Cyclic Group Z_1 via R-devel <
r-devel@r-project.org>:

> In R scripts (as opposed to packages), even in reproducible scripts, it
> seems fairly conventional to use the global workspace as a sort of main
> function, and thus R scripts often populate the global environment with
> many variables, which may be mutated. Although this makes sense given R has
> historically been used interactively and this practice is common for
> scripting languages, this appears to disagree with the software-engineering
> principle of avoiding a mutating global state. Although this is just a rule
> of thumb, in R scripts, the frequent use of global variables is much more
> pronounced than in other languages.
>
> On the other hand, in Python, it is common to use a main function (through
> the `def main():` and  `if __name__ == "__main__":` idioms). This is
> mentioned both in the documentation as well as in the writing of Python's
> main creator. Although this is more beneficial in Python than in R because
> Python code is structured into modules, which serve as both scripts and
> packages, whereas R separates these conceptually, a similar practice of
> creating a main function would help avoid the issues from mutating global
> state common to other languages and facilitate maintainability, especially
> for longer scripts.
>
> Although many great R texts (Advanced R, Art of R Programming, etc.)
> caution against assignment in a parent enclosure (e.g., using `<<-`, or
> `assign`), I have not seen many promote the use of a main function and
> avoiding mutating global variables from top level.
>
> Would it be a good idea to promote use of main functions and limiting
> global-state mutation for longer scripts and dedicated applications (not
> one-off scripts)? Should these practices be mentioned in the standard
> documentation?
>
> This question was motivated largely by this discussion on Reddit:
> https://www.reddit.com/r/rstats/comments/cp3kva/is_mutating_global_state_acceptable_in_r/
>  .
> Apologies beforehand if any of these (partially subjective) assessments are
> in error.
>
> Best,
> CG
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conventions: Use of globals and main functions

2019-08-27 Thread Abby Spurdle
> this appears to disagree with the software-engineering principle of avoiding 
> a mutating global state

I disagree.
In embedded systems engineering, for example, it's customary to use
global variables to represent ports.

Also, I note that the use of global variables, is similar to using pen
and paper, to do mathematics and statistics.
(Which is good).
Whether that's consistent with software engineering principles or not,
I don't know.

However, I partly agree with you.
Given that there's interest from various parties in running R in
various ways, it may be good to document some of the options
available.

"Running R" (in "R Installation and Administration") links to
"Appendix B Invoking R" (in "An Introduction to R").
However, these sections do not cover the topics in this thread.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conventions: Use of globals and main functions

2019-08-27 Thread Abby Spurdle
> "Running R" (in "R Installation and Administration") links to
> "Appendix B Invoking R" (in "An Introduction to R").
> However, these sections do not cover the topics in this thread.

Sorry, I made a mistake.
It is in the documentation (B.4 Scripting with R)
e.g.

(excerpts only)
R CMD BATCH "--args arg1 arg2" foo.R &
args <- commandArgs(TRUE)
Rscript foo.R arg1 arg2

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conventions: Use of globals and main functions

2019-08-27 Thread Henrik Bengtsson
FWIW, one could imagine introducing a helper function global();

global <- function(expr) { eval(substitute(expr), envir = globalenv(),
enclos = baseenv()) }

to make it explicit that any assignments (and evaluation in general)
take place in the global environment, e.g.

> local({ global(a <- 2) })
> a
[1] 2

That "looks" nicer than assign("a", 2, envir = globalenv()) and it's
safer than assuming a <<- 2 will "reach" the global environment.

/Henrik

On Tue, Aug 27, 2019 at 3:19 PM Abby Spurdle  wrote:
>
> > this appears to disagree with the software-engineering principle of 
> > avoiding a mutating global state
>
> I disagree.
> In embedded systems engineering, for example, it's customary to use
> global variables to represent ports.
>
> Also, I note that the use of global variables, is similar to using pen
> and paper, to do mathematics and statistics.
> (Which is good).
> Whether that's consistent with software engineering principles or not,
> I don't know.
>
> However, I partly agree with you.
> Given that there's interest from various parties in running R in
> various ways, it may be good to document some of the options
> available.
>
> "Running R" (in "R Installation and Administration") links to
> "Appendix B Invoking R" (in "An Introduction to R").
> However, these sections do not cover the topics in this thread.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] What is the best way to loop over an ALTREP vector?

2019-08-27 Thread Wang Jiefei
Hi devel team,

I'm working on C/C++ level ALTREP compatibility for a package. The package
previously used pointers to access the data of a SEXP, so it would not work
for some ALTREP objects which do not have a pointer. I plan to rewrite the
code and use functions like get_elt, get_region, and get_subset to access
the values of a vector, so I have a few questions for ALTREP:

1. Since an ALTREP do not have to define all of the above
functions(element, region, subset), is there any way to check which
function has been defined for an ALTREP class? I did a search on
RInternal.h and altrep.c but did not find a solution for it. If not, will
it be added in the future?

2. Given the diversity of ALTREP classes, what is the best way to loop over
an ALTREP object? I hope there can be an all-in-one function which can get
the values from a vector as long as at least one of the above functions has
been defined, so package developers would not be bothered by tons of
`if-else` statement if they want their package to work with ALTREP. Since
it seems like there is no such function exist, what could be the best way
to do the loop under the current R version?

Best,
Jiefei

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conventions: Use of globals and main functions

2019-08-27 Thread Cyclic Group Z_1 via R-devel
Definitely, I agree that global variables have a place in programming. They 
play an especially important role in low-level software, such as embedded 
programming, as you mentioned, and systems programming. I generally would 
disagree with anyone that says global variables should never be used, and they 
may be the best implementation option when something is "truly global."

However, in R scripting conventions, they are the default. I don't think it is 
controversial to say that in software engineering culture, there is a generally 
held principle that global variables should be minimized because they can be 
dangerous (granted, the original "Globals considered harmful" article is quite 
old, and many of the criticisms not applicable to modern languages). I do think 
it is equally important, though, to understand when to break this rule.

I like your suggestion of documenting this as an alternative option, though it 
seems the general sentiment is against this, which I respect.

Best,
CG

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conventions: Use of globals and main functions

2019-08-27 Thread Cyclic Group Z_1 via R-devel
> That beeing said I think the main task of scripts is to get things done via 
>running them end to end in a fresh session. Now, it very well may happen that 
>a lot of stuff has to be done. Than splitting up scripts into subscripts and 
>sourcing them from a meta script is a straightforward solution. It might also 
>be that some functionality is put into functions to be reused in other places. 
>This can be done by putting those function definitions into separate files. 
>Than one cane use source wherever those functions are needed. Now, putting 
>stuff that runs code and scripts that define/provovide functions into the same 
>script is a bad idea. Using the main()-idioms described might prevent this the 
>problems stemming from mixing functions and function execution. But it would 
>also encourage this mixing which is - I think, a bad idea anyways. 

I actually would agree entirely that files should not serve as both source 
files for re-used functions as well as application code. The suggestion for a 
main() idiom is merely to reduce variable scope and bring R practices more in 
line with generally recommended programming practices, not so that they can act 
as packages/modules/libraries. When I compared R scripts containing main 
functions to packages, I only mean in the sense that they help manage scope 
(the latter through package namespaces). Any other named functions besides main 
would be functions specifically tied to the script. 

I do see your point, though, that this could result in bad practice, namely the 
usage mixing you described. 

Best,
CG

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel