Re: [Rd] New pipe operator

2020-12-07 Thread Duncan Murdoch

On 06/12/2020 8:22 p.m., Bravington, Mark (Data61, Hobart) wrote:

Seems like this *could* be a good thing, and thanks to R core for considering 
it. But, FWIW:

  - I agree with Gabor G that consistency of "syntax" should be paramount here. 
Enough problems have been caused by earlier superficially-convenient non-standard 
features in R.  In particular:

  -- there should not be any discrepancy between an in-place 
function-definition, and a predefined function attached to a symbol (as per 
Gabor's point).
  
  -- Hence, the ability to say x |> foo  ie without parentheses, seems bound to lead to inconsistency, because x |> foo is allowed, x |> base::foo isn't allowed without tricks, but x |> function( y) foo( y) isn't... So, x |> foo is not worth keeping. Parentheses are a price well worth paying.
  
  -- it is still inconsistent and confusing to (apparently) invoke a function in some places--- normally--- via 'foo(x)', yet in others--- pipily--- via 'foo()'. Especially if 'foo' already has a default value for its first argument.


  - I don't see the problem with a placeholder--- doesn't it remove all 
ambiguity? Sure there needs to be a standard unclashable name and people can 
argue about what that should be, but the following seems clear and flexible... 
to me, anyway:
  
  thing |>

foo( _PIPE_) |>   # standard
bah( arg1, _PIPE_) |>   # multi-arg function
_ANON_({ x <- sum( _PIPE_); _PIPE_/x + x/_PIPE_ })   # anon function
   
where '_PIPE_' is the ordained name of the placeholder, and '_ANON_' constructs-and-calls a function with single argument '_PIPE_'. There is just one rule (I think...): each pipe-stage must be a *call* involving the argument '_PIPE_'.


I believe there's no ambiguity if the placeholder is *only* allowed in  
the RHS of a pipe expression.  I think the ambiguity arises if you allow  
the same syntax to be used to generate anonymous functions.  We can't  
use _PIPE_ as the placeholder, because it's a legal name.  But we could  
use _.  Then


  x |> (_ + 1) + mean(_)

could expand unambiguously to

  (function(_) (_  + 1) + mean(_))(x)

but

  (_ + 1) + mean(_)

shouldn't be taken to be an anonymous function declaration, otherwise  
things like


  mean(_ |> _)

do become ambiguous:  is the second placeholder the argument to the anon  
function, or is it the placeholder for the embedded pipe?


However, implementing this makes the parser pretty ugly:  its handling  
of _ depends on the outer context.  I now agree that leaving out  
placeholder syntax was the right decision.






  - The proposed anonymous-function syntax looks quite ugly to me, diminishing 
readability and inviting errors. The new pipe symbol |> already looks scarily 
like quantum mechanics; adding \( just puts fishbones into the symbolic soup.

  - IMO it's not worth going too far to try to lure magritter-etc fans to swap 
to the new; my experience is that many people keep using older inferior R 
syntax for years after better replacements become available (even if they are 
aware of replacements), for various reasons. Just provide a good framework, and 
let nature take its course.
  
  - Disclaimer: personally I'm not much of a pipehead anyway, so maybe I'm not the audience. But if I was to consider piping, I wouldn't be very tempted by the current proposal. OTOH, I might even be tempted to write--- and use!--- my own version of '%|>%' as above (maybe someone already has). And if R did it for me, that'd be great :)


Yours would suffer one of the same problems as magrittr's:  it has the  
wrong operator precedence.  The current precedence ordering (from  
?Syntax) is, from highest to lowest:



:: :::  access variables in a namespace
$ @ component / slot extraction
[ [[indexing
^   exponentiation (right to left)
- + unary minus and plus
:   sequence operator
%any%   special operators (including %% and %/%)
* / multiply, divide
+ - (binary) add, subtract
< > <= >= == != ordering and comparison
!   negation
& &&and
| ||or
~   as in formulae
-> ->> rightwards assignment
<- <<- assignment (right to left)
=   assignment (right to left)
?   help (unary and binary)


The %>% operator has higher precedence than the arithmetic operators, so

x*y %>% f()

is equivalent to x*f(y), not

f(x*y)

as it should "obviously" be.  I believe the new |> operator falls  
between "| ||" and "~", so


x || y |> f()

is the same as f(x || y), and

x ~ y |> f()

is x ~ f(y).   There could be arguments about where the new one appears  
(and there probably have been), but *clearly* magrittr's precedence is  
wrong, and yours would be too, because they are both fixed at the quite  
high precedence given to %any%.


Duncan Murdoch

  
[*] Definition of _ANON_ could be something like this--- almost certainly won't work as-is, this is just to point out that it could be done in standard R.


`_ANON_` <- function( expr) {
   #1. Construct a function with arg '_PIPE_

Re: [Rd] New pipe operator

2020-12-07 Thread Duncan Murdoch

On 06/12/2020 9:23 p.m., Gabriel Becker wrote:

Hi Gabor,

On Sun, Dec 6, 2020 at 3:22 PM Gabor Grothendieck 
wrote:


I understand very well that it is implemented at the syntax level;
however, in any case the implementation is irrelevant to the principles.

Here a similar example to the one I gave before but this time written out:

This works:

   3 |> function(x) x + 1

but this does not:

   foo <- function(x) x + 1
   3 |> foo

so it breaks the principle of functions being first class objects.  foo
and its
definition are not interchangeable.



I understood what you meant as well.

The issue is that neither foo nor its definition are being operated on, or
even exist within the scope of what |> is defined to do. You are used to
magrittr's %>% where arguably what you are saying would be true. But its
not here, in my view.

Again, I think the issue is that |>, in as much as it "operates" on
anything at all (it not being a function, regardless of appearances),
operates on call expression objects, NOT on functions, ever.

function(x) x *parses to a call expression *as does RHSfun(), while RHSfun does
not, it parses to a name, *regardless of whether that symbol will
eventually evaluate to a closure or not.*

So in fact, it seems to me that, technically, all name symbols are being
treated exactly the same (none are allowed, including those which will
lookup to functions during evaluation), while all* call expressions are
also being treated the same. And again, there are no functions anywhere in
either case.


I agree it's all about call expressions, but they aren't all being 
treated equally:


x |> f(...)

expands to f(x, ...), while

x |> `function`(...)

expands to `function`(...)(x).  This is an exception to the rule for 
other calls, but I think it's a justified one.


Duncan Murdoch



* except those that include that the parser flags as syntactically special.



You have
to write 3 |> foo() but don't have to write 3 |> (function(x) x + 1)().



I think you should probably be careful what you wish for here. I'm not
involved with this work and do not speak for any of those who were, but the
principled way to make that consistent while remaining entirely in the
parser seems very likely to be to require the latter, rather than not
require the former.



This isn't just a matter of notation, i.e. foo vs foo(), but is a
matter of breaking
the way R works as a functional language with first class functions.



I don't agree. Consider `+`

Having

foo <- get("+") ## note no `` here
foo(x,y)

parse and work correctly while

+(x,y)

  does not does not mean + isn't a function or that it is a "second class
citizen", it simply means that the parser has constraints on the syntax for
writing code that calls it that calling other functions are not subject to.
The fact that such *syntactic* constraints can exist proves that there is
not some overarching inviolable principle being violated here, I think. Now
you may say "well thats just the parser, it has to parse + specially
because its an operator with specific precedence etc". Well, the same exact
thing is true of |> I think.

Best,
~G



On Sun, Dec 6, 2020 at 4:06 PM Gabriel Becker 
wrote:


Hi Gabor,

On Sun, Dec 6, 2020 at 12:52 PM Gabor Grothendieck <

ggrothendi...@gmail.com> wrote:


I think the real issue here is that functions are supposed to be
first class objects in R
or are supposed to be and |> would break that if if is possible
to write function(x) x + 1 on the RHS but not foo (assuming foo
was defined as that function).

I don't think getting experience with using it can change that
inconsistency which seems serious to me and needs to
be addressed even if it complicates the implementation
since it drives to the heart of what R is.



With respect I think this is a misunderstanding of what is happening

here.


Functions are first class citizens. |> is, for all intents and purposes,

a macro.


LHS |> RHS(arg2=5)

parses to

RHS(LHS, arg2 = 5)

There are no functions at the point in time when the pipe transformation

happens, because no code has been evaluated. To know if a symbol is going
to evaluate to a function requires evaluation which is a step entirely
after the one where the |> pipe is implemented.


Another way to think about it is that

LHS |> RHS(arg2 = 5)

is another way of writing RHS(LHS, arg2 = 5), NOT R code that is (or

even can be) evaluated.



Now this is a subtle point that only really has implications in as much

as it is not the case for magrittr pipes, but its relevant for discussions
like this, I think.


~G


On Sat, Dec 5, 2020 at 1:08 PM Gabor Grothendieck
 wrote:


The construct utils::head  is not that common but bare functions are
very common and to make it harder to use the common case so that
the uncommon case is slightly easier is not desirable.

Also it is trivial to write this which does work:

mtcars %>% (utils::head)

On Sat, Dec 5, 2020 at 11:59 AM Hugh Parsonage <

hugh.parson...@gmail.com> wrote:


I'm surpris

Re: [Rd] New pipe operator

2020-12-07 Thread Gabor Grothendieck
On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  wrote:
> I agree it's all about call expressions, but they aren't all being
> treated equally:
>
> x |> f(...)
>
> expands to f(x, ...), while
>
> x |> `function`(...)
>
> expands to `function`(...)(x).  This is an exception to the rule for
> other calls, but I think it's a justified one.

This admitted inconsistency is justified by what?  No argument has been
presented.  The justification seems to be implicitly driven by implementation
concerns at the expense of usability and language consistency.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread Gabor Grothendieck
On Sat, Dec 5, 2020 at 1:19 PM  wrote:
> Let's get some experience

Here is my last SO post using dplyr rewritten to use R 4.1 devel.  Seems
not too bad.  Was able to work around the placeholder for gsub by specifying
the arg names and used \(...)... elsewhere.  This does not address the
inconsistency discussed though.  I have indented by 2 spaced in case the
email wraps around.  The objective is to read myfile.csv including columns that
contain c(...) and integer(0), parsing and evaluating them.


  # taken from:
  # 
https://stackoverflow.com/questions/65174764/reading-in-a-csv-that-contains-vectors-cx-y-in-r/65175172#65175172

  # create input file for testing
  Lines <- 
"\"col1\",\"col2\",\"col3\"\n\"a\",1,integer(0)\n\"c\",c(3,4),5\n\"e\",6,7\n"
  cat(Lines, file = "myfile.csv")

  #
  # base R 4.1 (devel)
  DF <- "myfile.csv" |>
readLines() |>
gsub(pattern = r'{(c\(.*?\)|integer\(0\))}', replacement = r'{"\1"}') |>
\(.) read.csv(text = .) |>
\(.) replace(., 2:3, lapply(.[2:3], \(col) lapply(col, \(x)
eval(parse(text = x)

  #
  # dplyr/magrittr
  library(dplyr)

  DF <- "myfile.csv" %>%
readLines %>%
gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) %>%
{ read.csv(text = .) } %>%
mutate(across(2:3, ~ lapply(., function(x) eval(parse(text = x)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread luke-tierney

Or, keeping dplyr but with R-devel pipe and function shorthand:

DF <- "myfile.csv" %>%
   readLines() |>
   \(.) gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) |>
   \(.) read.csv(text = .) |>
   mutate(across(2:3, \(col) lapply(col, \(x) eval(parse(text = x)

Using named arguments to redirect to the implicit first does work,
also in magrittr, but for me at least it is the kind of thing I would
probably regret a month later when trying to figure out the code.

Best,

luke

On Mon, 7 Dec 2020, Gabor Grothendieck wrote:


On Sat, Dec 5, 2020 at 1:19 PM  wrote:

Let's get some experience


Here is my last SO post using dplyr rewritten to use R 4.1 devel.  Seems
not too bad.  Was able to work around the placeholder for gsub by specifying
the arg names and used \(...)... elsewhere.  This does not address the
inconsistency discussed though.  I have indented by 2 spaced in case the
email wraps around.  The objective is to read myfile.csv including columns that
contain c(...) and integer(0), parsing and evaluating them.


 # taken from:
 # 
https://stackoverflow.com/questions/65174764/reading-in-a-csv-that-contains-vectors-cx-y-in-r/65175172#65175172

 # create input file for testing
 Lines <- 
"\"col1\",\"col2\",\"col3\"\n\"a\",1,integer(0)\n\"c\",c(3,4),5\n\"e\",6,7\n"
 cat(Lines, file = "myfile.csv")

 #
 # base R 4.1 (devel)
 DF <- "myfile.csv" |>
   readLines() |>
   gsub(pattern = r'{(c\(.*?\)|integer\(0\))}', replacement = r'{"\1"}') |>
   \(.) read.csv(text = .) |>
   \(.) replace(., 2:3, lapply(.[2:3], \(col) lapply(col, \(x)
eval(parse(text = x)

 #
 # dplyr/magrittr
 library(dplyr)

 DF <- "myfile.csv" %>%
   readLines %>%
   gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) %>%
   { read.csv(text = .) } %>%
   mutate(across(2:3, ~ lapply(., function(x) eval(parse(text = x)



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Deepayan Sarkar
On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
 wrote:
>
> On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  
> wrote:
> > I agree it's all about call expressions, but they aren't all being
> > treated equally:
> >
> > x |> f(...)
> >
> > expands to f(x, ...), while
> >
> > x |> `function`(...)
> >
> > expands to `function`(...)(x).  This is an exception to the rule for
> > other calls, but I think it's a justified one.
>
> This admitted inconsistency is justified by what?  No argument has been
> presented.  The justification seems to be implicitly driven by implementation
> concerns at the expense of usability and language consistency.

Sorry if I have missed something, but is your consistency argument
basically that if

foo <- function(x) x + 1

then

x |> foo
x |> function(x) x + 1

should both work the same? Suppose it did. Would you then be OK if

x |> foo()

no longer worked as it does now, and produced foo()(x) instead of foo(x)?

If you are not OK with that and want to retain the current behaviour,
what would you want to happen with the following?

bar <- function(x) function(n) rnorm(n, mean = x)

10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
10 |> bar(runif(1)) # currently bar(10, runif(1))

both of which you probably want. But then

baz <-  bar(runif(1))
10 |> baz

(not currently allowed) will not be the same as what you would want from

10 |> bar(runif(1))

which leads to a different kind of inconsistency, doesn't it?

-Deepayan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Gabor Grothendieck
One could examine how magrittr works as a reference implementation if
there is a question on how something should function.  It's in
widespread use and seems to work well.

On Mon, Dec 7, 2020 at 10:20 AM Deepayan Sarkar
 wrote:
>
> On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
>  wrote:
> >
> > On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  
> > wrote:
> > > I agree it's all about call expressions, but they aren't all being
> > > treated equally:
> > >
> > > x |> f(...)
> > >
> > > expands to f(x, ...), while
> > >
> > > x |> `function`(...)
> > >
> > > expands to `function`(...)(x).  This is an exception to the rule for
> > > other calls, but I think it's a justified one.
> >
> > This admitted inconsistency is justified by what?  No argument has been
> > presented.  The justification seems to be implicitly driven by 
> > implementation
> > concerns at the expense of usability and language consistency.
>
> Sorry if I have missed something, but is your consistency argument
> basically that if
>
> foo <- function(x) x + 1
>
> then
>
> x |> foo
> x |> function(x) x + 1
>
> should both work the same? Suppose it did. Would you then be OK if
>
> x |> foo()
>
> no longer worked as it does now, and produced foo()(x) instead of foo(x)?
>
> If you are not OK with that and want to retain the current behaviour,
> what would you want to happen with the following?
>
> bar <- function(x) function(n) rnorm(n, mean = x)
>
> 10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
> 10 |> bar(runif(1)) # currently bar(10, runif(1))
>
> both of which you probably want. But then
>
> baz <-  bar(runif(1))
> 10 |> baz
>
> (not currently allowed) will not be the same as what you would want from
>
> 10 |> bar(runif(1))
>
> which leads to a different kind of inconsistency, doesn't it?
>
> -Deepayan



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Deepayan Sarkar
On Mon, Dec 7, 2020 at 9:23 PM Gabor Grothendieck
 wrote:
>
> One could examine how magrittr works as a reference implementation if
> there is a question on how something should function.  It's in
> widespread use and seems to work well.

Yes, but it has many inconsistencies (including for the example I
gave). Do you want a magrittr clone, or do you want consistency? It's
OK to want either, but I don't think you can get both.

What we actually end up with is another matter, depending on many
other factors. I was just trying to understand your consistency
argument.

-Deepayan

> On Mon, Dec 7, 2020 at 10:20 AM Deepayan Sarkar
>  wrote:
> >
> > On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
> >  wrote:
> > >
> > > On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  
> > > wrote:
> > > > I agree it's all about call expressions, but they aren't all being
> > > > treated equally:
> > > >
> > > > x |> f(...)
> > > >
> > > > expands to f(x, ...), while
> > > >
> > > > x |> `function`(...)
> > > >
> > > > expands to `function`(...)(x).  This is an exception to the rule for
> > > > other calls, but I think it's a justified one.
> > >
> > > This admitted inconsistency is justified by what?  No argument has been
> > > presented.  The justification seems to be implicitly driven by 
> > > implementation
> > > concerns at the expense of usability and language consistency.
> >
> > Sorry if I have missed something, but is your consistency argument
> > basically that if
> >
> > foo <- function(x) x + 1
> >
> > then
> >
> > x |> foo
> > x |> function(x) x + 1
> >
> > should both work the same? Suppose it did. Would you then be OK if
> >
> > x |> foo()
> >
> > no longer worked as it does now, and produced foo()(x) instead of foo(x)?
> >
> > If you are not OK with that and want to retain the current behaviour,
> > what would you want to happen with the following?
> >
> > bar <- function(x) function(n) rnorm(n, mean = x)
> >
> > 10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
> > 10 |> bar(runif(1)) # currently bar(10, runif(1))
> >
> > both of which you probably want. But then
> >
> > baz <-  bar(runif(1))
> > 10 |> baz
> >
> > (not currently allowed) will not be the same as what you would want from
> >
> > 10 |> bar(runif(1))
> >
> > which leads to a different kind of inconsistency, doesn't it?
> >
> > -Deepayan
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread peter dalgaard
Hmm,

I feel a bit bad coming late to this, but I think I am beginning to side with 
those who want  "... |> head" to work. And yes, that has to happen at the 
expense of |> head().

As I think it was Gabor points out, the current structure goes down a 
nonstandard evaluation route, which may be difficult to explain and departs 
from usual operator evaluation paradigms by being an odd mix of syntax and 
semantics. R lets you do these sorts of thing, witness ggplot and tidyverse, 
but the transparency of the language tends to suffer. 

It would be neater if it was simply so that the class/type of the object on the 
right hand side decided what should happen. So we could have a rule that we 
could have an object, an expression, and possibly an unevaluated call on the 
RHS. Or maybe a formula, I.e., we could have

... |> head

but not  

... |> head() 

because head() does not evaluate to anything useful. Instead, we could have 
some of these

... |> quote(head())
... |> expression(head())
... |> ~ head()
... |> \(_) head(_)

possibly also using a placeholder mechanism for the three first ones. I kind of 
like the idea that the ~ could be equivalent to \(_).

(And yes, I am kicking myself a bit for not using ~ in the NSE arguments in 
subset() and transform())

-pd

> On 7 Dec 2020, at 16:20 , Deepayan Sarkar  wrote:
> 
> On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
>  wrote:
>> 
>> On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  
>> wrote:
>>> I agree it's all about call expressions, but they aren't all being
>>> treated equally:
>>> 
>>> x |> f(...)
>>> 
>>> expands to f(x, ...), while
>>> 
>>> x |> `function`(...)
>>> 
>>> expands to `function`(...)(x).  This is an exception to the rule for
>>> other calls, but I think it's a justified one.
>> 
>> This admitted inconsistency is justified by what?  No argument has been
>> presented.  The justification seems to be implicitly driven by implementation
>> concerns at the expense of usability and language consistency.
> 
> Sorry if I have missed something, but is your consistency argument
> basically that if
> 
> foo <- function(x) x + 1
> 
> then
> 
> x |> foo
> x |> function(x) x + 1
> 
> should both work the same? Suppose it did. Would you then be OK if
> 
> x |> foo()
> 
> no longer worked as it does now, and produced foo()(x) instead of foo(x)?
> 
> If you are not OK with that and want to retain the current behaviour,
> what would you want to happen with the following?
> 
> bar <- function(x) function(n) rnorm(n, mean = x)
> 
> 10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
> 10 |> bar(runif(1)) # currently bar(10, runif(1))
> 
> both of which you probably want. But then
> 
> baz <-  bar(runif(1))
> 10 |> baz
> 
> (not currently allowed) will not be the same as what you would want from
> 
> 10 |> bar(runif(1))
> 
> which leads to a different kind of inconsistency, doesn't it?
> 
> -Deepayan
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread Gabor Grothendieck
On Mon, Dec 7, 2020 at 10:11 AM  wrote:
> Or, keeping dplyr but with R-devel pipe and function shorthand:
>
> DF <- "myfile.csv" %>%
> readLines() |>
> \(.) gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) |>
> \(.) read.csv(text = .) |>
> mutate(across(2:3, \(col) lapply(col, \(x) eval(parse(text = x)
>
> Using named arguments to redirect to the implicit first does work,
> also in magrittr, but for me at least it is the kind of thing I would
> probably regret a month later when trying to figure out the code.

The gsub issue suggests that if one were to start afresh
that the arguments to gsub (and many other R functions)
should be rearranged.  Of course, that is precisely what
the tidyverse did.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Duncan Murdoch

On 07/12/2020 11:18 a.m., peter dalgaard wrote:

Hmm,

I feel a bit bad coming late to this, but I think I am beginning to side with those who want  
"... |> head" to work. And yes, that has to happen at the expense of |> head().


Just curious, how would you express head(df, 10)?  Currently it is

 df |> head(10)

Would I have to write it as

 df |> function(d) head(d, 10)



As I think it was Gabor points out, the current structure goes down a 
nonstandard evaluation route, which may be difficult to explain and departs 
from usual operator evaluation paradigms by being an odd mix of syntax and 
semantics. R lets you do these sorts of thing, witness ggplot and tidyverse, 
but the transparency of the language tends to suffer.


I wouldn't call it non-standard evaluation.  There is no function 
corresponding to |>, so there's no evaluation at all.  It is more like 
the way "x -> y" is parsed as "y <- x", or "if (x) y" is transformed to 
`if`(x, y).


Duncan Murdoch


It would be neater if it was simply so that the class/type of the object on the 
right hand side decided what should happen. So we could have a rule that we 
could have an object, an expression, and possibly an unevaluated call on the 
RHS. Or maybe a formula, I.e., we could hav

... |> head

but not

... |> head()

because head() does not evaluate to anything useful. Instead, we could have 
some of these

... |> quote(head())
... |> expression(head())
... |> ~ head()
... |> \(_) head(_)

possibly also using a placeholder mechanism for the three first ones. I kind of 
like the idea that the ~ could be equivalent to \(_).

(And yes, I am kicking myself a bit for not using ~ in the NSE arguments in 
subset() and transform())

-pd


On 7 Dec 2020, at 16:20 , Deepayan Sarkar  wrote:

On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
 wrote:


On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  wrote:

I agree it's all about call expressions, but they aren't all being
treated equally:

x |> f(...)

expands to f(x, ...), while

x |> `function`(...)

expands to `function`(...)(x).  This is an exception to the rule for
other calls, but I think it's a justified one.


This admitted inconsistency is justified by what?  No argument has been
presented.  The justification seems to be implicitly driven by implementation
concerns at the expense of usability and language consistency.


Sorry if I have missed something, but is your consistency argument
basically that if

foo <- function(x) x + 1

then

x |> foo
x |> function(x) x + 1

should both work the same? Suppose it did. Would you then be OK if

x |> foo()

no longer worked as it does now, and produced foo()(x) instead of foo(x)?

If you are not OK with that and want to retain the current behaviour,
what would you want to happen with the following?

bar <- function(x) function(n) rnorm(n, mean = x)

10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
10 |> bar(runif(1)) # currently bar(10, runif(1))

both of which you probably want. But then

baz <-  bar(runif(1))
10 |> baz

(not currently allowed) will not be the same as what you would want from

10 |> bar(runif(1))

which leads to a different kind of inconsistency, doesn't it?

-Deepayan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] anonymous functions

2020-12-07 Thread Therneau, Terry M., Ph.D. via R-devel
“The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be helpful in making 
code containing simple function expressions more readable.”


Color me unimpressed.
Over the decades I've seen several "who can write the shortest code" threads: in Fortran, 
in C, in Splus, ...   The same old idea that "short" is a synonym for either elegant, 
readable, or efficient is now being recylced in the tidyverse.   The truth is that "short" 
is actually an antonym for all of these things, at least for anyone else reading the code; 
or for the original coder 30-60 minutes after the "clever" lines were written.  Minimal 
use of the spacebar and/or the return key isn't usually held up as a goal, but creeps into 
many practiioner's code as well.


People are excited by replacing "function(" with "\("?  Really?   Are people typing code 
with their thumbs?
I am ambivalent about pipes: I think it is a great concept, but too many of my colleagues 
think that using pipes = no need for any comments.


As time goes on, I find my goal is to make my code less compact and more readable.  Every 
bug fix or new feature in the survival package now adds more lines of comments or other 
documentation than lines of code.  If I have to puzzle out what a line does, what about 
the poor sod who inherits the maintainance?



--
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread Gregory Warnes
My vote is for the consistency of function calls always having parentheses,
including in pipes.  Making them optional only saves two keystrokes, but
will add yet another inconsistency to confuse or trip folks up.

As for the new anonymous function syntax, I would prefer something more
human friendly, perhaps provide “fun” as a shortcut for “function”,
enabling:

DF <- "myfile.csv" %>%
readLines() |>
fun(x) gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', x) |>
fun(x) read.csv(text = x)|>
   mutate(
across(2:3,
  fun(col) lapply(col,
   fun(x) eval(parse(text = x))
  )
)
   )

which seems much easier to read and understand, at the cost of only a few
extra characters.

-G

On Mon, Dec 7, 2020 at 11:21 AM Gabor Grothendieck 
wrote:

> On Mon, Dec 7, 2020 at 10:11 AM  wrote:
> > Or, keeping dplyr but with R-devel pipe and function shorthand:
> >
> > DF <- "myfile.csv" %>%
> > readLines() |>
> > \(.) gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', .) |>
> > \(.) read.csv(text = .) |>
> > mutate(across(2:3, \(col) lapply(col, \(x) eval(parse(text = x)
> >
> > Using named arguments to redirect to the implicit first does work,
> > also in magrittr, but for me at least it is the kind of thing I would
> > probably regret a month later when trying to figure out the code.
>
> The gsub issue suggests that if one were to start afresh
> that the arguments to gsub (and many other R functions)
> should be rearranged.  Of course, that is precisely what
> the tidyverse did.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
"Whereas true religion and good morals are the only solid foundations of
public liberty and happiness . . . it is hereby earnestly recommended to
the several States to take the most effectual measures for the
encouragement thereof." Continental Congress, 1778

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Peter Dalgaard



> On 7 Dec 2020, at 17:35 , Duncan Murdoch  wrote:
> 
> On 07/12/2020 11:18 a.m., peter dalgaard wrote:
>> Hmm,
>> I feel a bit bad coming late to this, but I think I am beginning to side 
>> with those who want  "... |> head" to work. And yes, that has to happen at 
>> the expense of |> head().
> 
> Just curious, how would you express head(df, 10)?  Currently it is
> 
> df |> head(10)
> 
> Would I have to write it as
> 
> df |> function(d) head(d, 10)

It could be 

df |> ~ head(_, 10)

which in a sense is "yes" to your question.

> 
>> As I think it was Gabor points out, the current structure goes down a 
>> nonstandard evaluation route, which may be difficult to explain and departs 
>> from usual operator evaluation paradigms by being an odd mix of syntax and 
>> semantics. R lets you do these sorts of thing, witness ggplot and tidyverse, 
>> but the transparency of the language tends to suffer.
> 
> I wouldn't call it non-standard evaluation.  There is no function 
> corresponding to |>, so there's no evaluation at all.  It is more like the 
> way "x -> y" is parsed as "y <- x", or "if (x) y" is transformed to `if`(x, 
> y).

That's a point, but maybe also my point. Currently, the parser is inserting the 
LHS as the 1st argument of the RHS, right? Things might be simpler if it was 
more like a simple binop.

-pd

> Duncan Murdoch
> 
>> It would be neater if it was simply so that the class/type of the object on 
>> the right hand side decided what should happen. So we could have a rule that 
>> we could have an object, an expression, and possibly an unevaluated call on 
>> the RHS. Or maybe a formula, I.e., we could hav
>> ... |> head
>> but not
>> ... |> head()
>> because head() does not evaluate to anything useful. Instead, we could have 
>> some of these
>> ... |> quote(head())
>> ... |> expression(head())
>> ... |> ~ head()
>> ... |> \(_) head(_)
>> possibly also using a placeholder mechanism for the three first ones. I kind 
>> of like the idea that the ~ could be equivalent to \(_).
>> (And yes, I am kicking myself a bit for not using ~ in the NSE arguments in 
>> subset() and transform())
>> -pd
>>> On 7 Dec 2020, at 16:20 , Deepayan Sarkar  wrote:
>>> 
>>> On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
>>>  wrote:
 
 On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  
 wrote:
> I agree it's all about call expressions, but they aren't all being
> treated equally:
> 
> x |> f(...)
> 
> expands to f(x, ...), while
> 
> x |> `function`(...)
> 
> expands to `function`(...)(x).  This is an exception to the rule for
> other calls, but I think it's a justified one.
 
 This admitted inconsistency is justified by what?  No argument has been
 presented.  The justification seems to be implicitly driven by 
 implementation
 concerns at the expense of usability and language consistency.
>>> 
>>> Sorry if I have missed something, but is your consistency argument
>>> basically that if
>>> 
>>> foo <- function(x) x + 1
>>> 
>>> then
>>> 
>>> x |> foo
>>> x |> function(x) x + 1
>>> 
>>> should both work the same? Suppose it did. Would you then be OK if
>>> 
>>> x |> foo()
>>> 
>>> no longer worked as it does now, and produced foo()(x) instead of foo(x)?
>>> 
>>> If you are not OK with that and want to retain the current behaviour,
>>> what would you want to happen with the following?
>>> 
>>> bar <- function(x) function(n) rnorm(n, mean = x)
>>> 
>>> 10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
>>> 10 |> bar(runif(1)) # currently bar(10, runif(1))
>>> 
>>> both of which you probably want. But then
>>> 
>>> baz <-  bar(runif(1))
>>> 10 |> baz
>>> 
>>> (not currently allowed) will not be the same as what you would want from
>>> 
>>> 10 |> bar(runif(1))
>>> 
>>> which leads to a different kind of inconsistency, doesn't it?
>>> 
>>> -Deepayan
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] anonymous functions

2020-12-07 Thread Gregory Warnes
Thanks for expressing this eloquently. I heartily agree.

On Mon, Dec 7, 2020 at 12:04 PM Therneau, Terry M., Ph.D. via R-devel <
r-devel@r-project.org> wrote:

> “The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be
> helpful in making
> code containing simple function expressions more readable.”
>
> Color me unimpressed.
> Over the decades I've seen several "who can write the shortest code"
> threads: in Fortran,
> in C, in Splus, ...   The same old idea that "short" is a synonym for
> either elegant,
> readable, or efficient is now being recylced in the tidyverse.   The truth
> is that "short"
> is actually an antonym for all of these things, at least for anyone else
> reading the code;
> or for the original coder 30-60 minutes after the "clever" lines were
> written.  Minimal
> use of the spacebar and/or the return key isn't usually held up as a goal,
> but creeps into
> many practiioner's code as well.
>
> People are excited by replacing "function(" with "\("?  Really?   Are
> people typing code
> with their thumbs?
> I am ambivalent about pipes: I think it is a great concept, but too many
> of my colleagues
> think that using pipes = no need for any comments.
>
> As time goes on, I find my goal is to make my code less compact and more
> readable.  Every
> bug fix or new feature in the survival package now adds more lines of
> comments or other
> documentation than lines of code.  If I have to puzzle out what a line
> does, what about
> the poor sod who inherits the maintainance?
>
>
> --
> Terry M Therneau, PhD
> Department of Health Science Research
> Mayo Clinic
> thern...@mayo.edu
>
> "TERR-ree THUR-noh"
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
"Whereas true religion and good morals are the only solid foundations of
public liberty and happiness . . . it is hereby earnestly recommended to
the several States to take the most effectual measures for the
encouragement thereof." Continental Congress, 1778

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] anonymous functions

2020-12-07 Thread luke-tierney

I don't disagree in principle, but the reality is users want shortcuts
and as a result various packages, in particular tidyverse, have been
providing them. Mostly based on formulas, mostly with significant
issues since formulas weren't designed for this, and mostly
incompatible (tidyverse ones are compatible within tidyverse but not
with others). And of course none work in sapply or lapply. Providing a
shorthand in base may help to improve this. You don't have to use it
if you don't want to, and you can establish coding standards that
disallow it if you like.

Best,

luke

On Mon, 7 Dec 2020, Therneau, Terry M., Ph.D. via R-devel wrote:

“The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be 
helpful in making code containing simple function expressions more readable.”


Color me unimpressed.
Over the decades I've seen several "who can write the shortest code" threads: 
in Fortran, in C, in Splus, ...   The same old idea that "short" is a synonym 
for either elegant, readable, or efficient is now being recylced in the 
tidyverse.   The truth is that "short" is actually an antonym for all of 
these things, at least for anyone else reading the code; or for the original 
coder 30-60 minutes after the "clever" lines were written.  Minimal use of 
the spacebar and/or the return key isn't usually held up as a goal, but 
creeps into many practiioner's code as well.


People are excited by replacing "function(" with "\("?  Really?   Are people 
typing code with their thumbs?
I am ambivalent about pipes: I think it is a great concept, but too many of 
my colleagues think that using pipes = no need for any comments.


As time goes on, I find my goal is to make my code less compact and more 
readable.  Every bug fix or new feature in the survival package now adds more 
lines of comments or other documentation than lines of code.  If I have to 
puzzle out what a line does, what about the poor sod who inherits the 
maintainance?






--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread Duncan Murdoch

On 07/12/2020 12:03 p.m., Gregory Warnes wrote:

My vote is for the consistency of function calls always having parentheses,
including in pipes.  Making them optional only saves two keystrokes, but
will add yet another inconsistency to confuse or trip folks up.

As for the new anonymous function syntax, I would prefer something more
human friendly, perhaps provide “fun” as a shortcut for “function”,
enabling:

DF <- "myfile.csv" %>%
 readLines() |>
 fun(x) gsub(r'{(c\(.*?\)|integer\(0\))}', r'{"\1"}', x) |>
 fun(x) read.csv(text = x)|>
mutate(
 across(2:3,
   fun(col) lapply(col,
fun(x) eval(parse(text = x))
   )
 )
)

which seems much easier to read and understand, at the cost of only a few
extra characters.


But you didn't "always" include parentheses, you skipped them on the 
calls to the anonymous functions.  I think that's the one place I'd make 
the exception, so maybe we agree:  parens are almost always needed, with 
the sole exception being anonymous functions.


As to using "fun", I think that's a bad idea.  I haven't checked, but I 
wouldn't be too surprised if "fun" has been used thousands of times in 
CRAN packages as the name of a function.  So


 x |> fun(y)

would mean "fun(x, y)", whereas

 x |> fun(y) y+1

would mean (function(y) y+1)(x).

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Duncan Murdoch

On 07/12/2020 12:09 p.m., Peter Dalgaard wrote:




On 7 Dec 2020, at 17:35 , Duncan Murdoch  wrote:

On 07/12/2020 11:18 a.m., peter dalgaard wrote:

Hmm,
I feel a bit bad coming late to this, but I think I am beginning to side with those who want  
"... |> head" to work. And yes, that has to happen at the expense of |> head().


Just curious, how would you express head(df, 10)?  Currently it is

df |> head(10)

Would I have to write it as

df |> function(d) head(d, 10)


It could be

df |> ~ head(_, 10)

which in a sense is "yes" to your question.


I think that's doing too much weird stuff.  I wouldn't want to have to 
teach it to beginners, whereas I think I could teach "df |> head(10)". 
That's doing one weird thing, but I'd count about three things I'd 
consider weird in yours.








As I think it was Gabor points out, the current structure goes down a 
nonstandard evaluation route, which may be difficult to explain and departs 
from usual operator evaluation paradigms by being an odd mix of syntax and 
semantics. R lets you do these sorts of thing, witness ggplot and tidyverse, 
but the transparency of the language tends to suffer.


I wouldn't call it non-standard evaluation.  There is no function corresponding to |>, so there's no evaluation at 
all.  It is more like the way "x -> y" is parsed as "y <- x", or "if (x) y" is 
transformed to `if`(x, y).


That's a point, but maybe also my point. Currently, the parser is inserting the 
LHS as the 1st argument of the RHS, right? Things might be simpler if it was 
more like a simple binop.


An advantage of the current implementation is that it's simple and easy 
to understand.  Once you make it a user-modifiable binary operator, 
things will go kind of nuts.


For example, I doubt if there are many users of magrittr's pipe who 
really understand its subtleties, e.g. the example in Luke's paper where 
1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And 
I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to 
continue the fun.)


Duncan Murdoch




-pd


Duncan Murdoch


It would be neater if it was simply so that the class/type of the object on the 
right hand side decided what should happen. So we could have a rule that we 
could have an object, an expression, and possibly an unevaluated call on the 
RHS. Or maybe a formula, I.e., we could hav
... |> head
but not
... |> head()
because head() does not evaluate to anything useful. Instead, we could have 
some of these
... |> quote(head())
... |> expression(head())
... |> ~ head()
... |> \(_) head(_)
possibly also using a placeholder mechanism for the three first ones. I kind of 
like the idea that the ~ could be equivalent to \(_).
(And yes, I am kicking myself a bit for not using ~ in the NSE arguments in 
subset() and transform())
-pd

On 7 Dec 2020, at 16:20 , Deepayan Sarkar  wrote:

On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
 wrote:


On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  wrote:

I agree it's all about call expressions, but they aren't all being
treated equally:

x |> f(...)

expands to f(x, ...), while

x |> `function`(...)

expands to `function`(...)(x).  This is an exception to the rule for
other calls, but I think it's a justified one.


This admitted inconsistency is justified by what?  No argument has been
presented.  The justification seems to be implicitly driven by implementation
concerns at the expense of usability and language consistency.


Sorry if I have missed something, but is your consistency argument
basically that if

foo <- function(x) x + 1

then

x |> foo
x |> function(x) x + 1

should both work the same? Suppose it did. Would you then be OK if

x |> foo()

no longer worked as it does now, and produced foo()(x) instead of foo(x)?

If you are not OK with that and want to retain the current behaviour,
what would you want to happen with the following?

bar <- function(x) function(n) rnorm(n, mean = x)

10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
10 |> bar(runif(1)) # currently bar(10, runif(1))

both of which you probably want. But then

baz <-  bar(runif(1))
10 |> baz

(not currently allowed) will not be the same as what you would want from

10 |> bar(runif(1))

which leads to a different kind of inconsistency, doesn't it?

-Deepayan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-07 Thread luke-tierney

On Mon, 7 Dec 2020, Peter Dalgaard wrote:





On 7 Dec 2020, at 17:35 , Duncan Murdoch  wrote:

On 07/12/2020 11:18 a.m., peter dalgaard wrote:

Hmm,
I feel a bit bad coming late to this, but I think I am beginning to side with those who want  
"... |> head" to work. And yes, that has to happen at the expense of |> head().


Just curious, how would you express head(df, 10)?  Currently it is

df |> head(10)

Would I have to write it as

df |> function(d) head(d, 10)


It could be

df |> ~ head(_, 10)

which in a sense is "yes" to your question.




As I think it was Gabor points out, the current structure goes down a 
nonstandard evaluation route, which may be difficult to explain and departs 
from usual operator evaluation paradigms by being an odd mix of syntax and 
semantics. R lets you do these sorts of thing, witness ggplot and tidyverse, 
but the transparency of the language tends to suffer.


I wouldn't call it non-standard evaluation.  There is no function corresponding to |>, so there's no evaluation at 
all.  It is more like the way "x -> y" is parsed as "y <- x", or "if (x) y" is 
transformed to `if`(x, y).


That's a point, but maybe also my point. Currently, the parser is inserting the 
LHS as the 1st argument of the RHS, right? Things might be simpler if it was 
more like a simple binop.


It can only be a simple binop if you only allow RHS functions of one argument.
Which would require currying along the lines Duncan showed. Something like:

`%>>%` <- function(x, f) f(x)
C1 <- function(f, ...) function(x) f(x, ...)

mtcars %>>% head
mtcars %>>% C1(head, 2)
mtcars %>>% C1(subset, cyl == 4) %>>% \(d) lm(mpg ~ disp, data = d)

This might fly if we lived in a world where most RHS functions take
one argument and only a few needed currying. That is the case in many
functional languages, but not for R. Making the common case of
multiple arguments easy means you have to work at the source level,
either in the parser or with some form of NSE.

Best,

luke



-pd


Duncan Murdoch


It would be neater if it was simply so that the class/type of the object on the 
right hand side decided what should happen. So we could have a rule that we 
could have an object, an expression, and possibly an unevaluated call on the 
RHS. Or maybe a formula, I.e., we could hav
... |> head
but not
... |> head()
because head() does not evaluate to anything useful. Instead, we could have 
some of these
... |> quote(head())
... |> expression(head())
... |> ~ head()
... |> \(_) head(_)
possibly also using a placeholder mechanism for the three first ones. I kind of 
like the idea that the ~ could be equivalent to \(_).
(And yes, I am kicking myself a bit for not using ~ in the NSE arguments in 
subset() and transform())
-pd

On 7 Dec 2020, at 16:20 , Deepayan Sarkar  wrote:

On Mon, Dec 7, 2020 at 6:53 PM Gabor Grothendieck
 wrote:


On Mon, Dec 7, 2020 at 5:41 AM Duncan Murdoch  wrote:

I agree it's all about call expressions, but they aren't all being
treated equally:

x |> f(...)

expands to f(x, ...), while

x |> `function`(...)

expands to `function`(...)(x).  This is an exception to the rule for
other calls, but I think it's a justified one.


This admitted inconsistency is justified by what?  No argument has been
presented.  The justification seems to be implicitly driven by implementation
concerns at the expense of usability and language consistency.


Sorry if I have missed something, but is your consistency argument
basically that if

foo <- function(x) x + 1

then

x |> foo
x |> function(x) x + 1

should both work the same? Suppose it did. Would you then be OK if

x |> foo()

no longer worked as it does now, and produced foo()(x) instead of foo(x)?

If you are not OK with that and want to retain the current behaviour,
what would you want to happen with the following?

bar <- function(x) function(n) rnorm(n, mean = x)

10 |> bar(runif(1))() # works 'as expected' ~ bar(runif(1))(10)
10 |> bar(runif(1)) # currently bar(10, runif(1))

both of which you probably want. But then

baz <-  bar(runif(1))
10 |> baz

(not currently allowed) will not be the same as what you would want from

10 |> bar(runif(1))

which leads to a different kind of inconsistency, doesn't it?

-Deepayan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] anonymous functions

2020-12-07 Thread Gabor Grothendieck
It is easier to understand a function if you can see the entire
function body at once on a page or screen and excessive verbosity
interferes with that.

On Mon, Dec 7, 2020 at 12:04 PM Therneau, Terry M., Ph.D. via R-devel
 wrote:
>
> “The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be 
> helpful in making
> code containing simple function expressions more readable.”
>
> Color me unimpressed.
> Over the decades I've seen several "who can write the shortest code" threads: 
> in Fortran,
> in C, in Splus, ...   The same old idea that "short" is a synonym for either 
> elegant,
> readable, or efficient is now being recylced in the tidyverse.   The truth is 
> that "short"
> is actually an antonym for all of these things, at least for anyone else 
> reading the code;
> or for the original coder 30-60 minutes after the "clever" lines were 
> written.  Minimal
> use of the spacebar and/or the return key isn't usually held up as a goal, 
> but creeps into
> many practiioner's code as well.
>
> People are excited by replacing "function(" with "\("?  Really?   Are people 
> typing code
> with their thumbs?
> I am ambivalent about pipes: I think it is a great concept, but too many of 
> my colleagues
> think that using pipes = no need for any comments.
>
> As time goes on, I find my goal is to make my code less compact and more 
> readable.  Every
> bug fix or new feature in the survival package now adds more lines of 
> comments or other
> documentation than lines of code.  If I have to puzzle out what a line does, 
> what about
> the poor sod who inherits the maintainance?
>
>
> --
> Terry M Therneau, PhD
> Department of Health Science Research
> Mayo Clinic
> thern...@mayo.edu
>
> "TERR-ree THUR-noh"
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R/S-PLUS] [EXTERNAL] Re: [External] anonymous functions

2020-12-07 Thread Therneau, Terry M., Ph.D. via R-devel
Luke,
   Mostly an aside.  I think that pipes are a good addition, and it is clear 
that you and 
other R-core thought through many of the details.   Congratulations on what 
appears to be 
solid work. I've used Unix since '79, so it is almost guarranteed that I like 
the basic 
idiom, and I expect to make use of it.  Users who think that pipes -- or any 
other code -- 
is so clear that comments are superfluous is no reflection on R core, and also 
a bit of a 
hobby horse for me.

I am a bit bemused by the flood of change suggestions, before people have had a 
chance to 
fully exercise the new code.   I'd suggest waiting several months, or a year, 
before major 
updates, straight up bugs excepted.   The same advice holds when moving into a 
new house.
One  experience with the survival package has been that most new ideas have 
been 
implemented locally, and we run with them for half a year before submission to 
CRAN.  I've 
had a few "really great" modifications that, thankfully, were never inflicted 
on the rest 
of the R community.

Terry T.

On 12/7/20 11:26 AM, luke-tier...@uiowa.edu wrote:
> I don't disagree in principle, but the reality is users want shortcuts
> and as a result various packages, in particular tidyverse, have been
> providing them. Mostly based on formulas, mostly with significant
> issues since formulas weren't designed for this, and mostly
> incompatible (tidyverse ones are compatible within tidyverse but not
> with others). And of course none work in sapply or lapply. Providing a
> shorthand in base may help to improve this. You don't have to use it
> if you don't want to, and you can establish coding standards that
> disallow it if you like.
>
> Best,
>
> luke
>
> On Mon, 7 Dec 2020, Therneau, Terry M., Ph.D. via R-devel wrote:
>
>> “The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be 
>> helpful in 
>> making code containing simple function expressions more readable.”
>>
>> Color me unimpressed.
>> Over the decades I've seen several "who can write the shortest code" 
>> threads: in 
>> Fortran, in C, in Splus, ...   The same old idea that "short" is a synonym 
>> for either 
>> elegant, readable, or efficient is now being recylced in the tidyverse.   
>> The truth is 
>> that "short" is actually an antonym for all of these things, at least for 
>> anyone else 
>> reading the code; or for the original coder 30-60 minutes after the "clever" 
>> lines were 
>> written. Minimal use of the spacebar and/or the return key isn't usually 
>> held up as a 
>> goal, but creeps into many practiioner's code as well.
>>
>> People are excited by replacing "function(" with "\("? Really?   Are people 
>> typing code 
>> with their thumbs?
>> I am ambivalent about pipes: I think it is a great concept, but too many of 
>> my 
>> colleagues think that using pipes = no need for any comments.
>>
>> As time goes on, I find my goal is to make my code less compact and more 
>> readable.  
>> Every bug fix or new feature in the survival package now adds more lines of 
>> comments or 
>> other documentation than lines of code.  If I have to puzzle out what a line 
>> does, what 
>> about the poor sod who inherits the maintainance?
>>
>>
>>
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Gabor Grothendieck
On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch  wrote:
> An advantage of the current implementation is that it's simple and easy
> to understand.  Once you make it a user-modifiable binary operator,
> things will go kind of nuts.
>
> For example, I doubt if there are many users of magrittr's pipe who
> really understand its subtleties, e.g. the example in Luke's paper where
> 1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And
> I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
> continue the fun.)

The rule is not so complicated.  Automatic insertion is done unless
you use dot in the top level function or if you surround it with
{...}.  It really makes sense since if you use gsub(pattern,
replacement, .) then surely you don't want automatic insertion and if
you surround it with { ... } then you are explicitly telling it not
to.

Assuming the existence of placeholders a possible simplification would
be to NOT do automatic insertion if { ... } is used and to use it
otherwise although personally having used it for some time I find the
existing rule in magrittr generally does what you want.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Kevin Ushey
IMHO the use of anonymous functions is a very clean solution to the
placeholder problem, and the shorthand lambda syntax makes it much
more ergonomic to use. Pipe implementations that crawl the RHS for
usages of `.` are going to be more expensive than the alternatives. It
is nice that the `|>` operator is effectively the same as a regular R
function call, and given the identical semantics could then also be
reasoned about the same way regular R function calls are.

I also agree usages of the `.` placeholder can make the code more
challenging to read, since understanding the behavior of a piped
expression then requires scouring the RHS for usages of `.`, which can
be challenging in dense code. Piping to an anonymous function makes
the intent clear to the reader: the programmer is likely piping to an
anonymous function because they care where the argument is used in the
call, and so the reader of code should be aware of that.

Best,
Kevin



On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck
 wrote:
>
> On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch  
> wrote:
> > An advantage of the current implementation is that it's simple and easy
> > to understand.  Once you make it a user-modifiable binary operator,
> > things will go kind of nuts.
> >
> > For example, I doubt if there are many users of magrittr's pipe who
> > really understand its subtleties, e.g. the example in Luke's paper where
> > 1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And
> > I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
> > continue the fun.)
>
> The rule is not so complicated.  Automatic insertion is done unless
> you use dot in the top level function or if you surround it with
> {...}.  It really makes sense since if you use gsub(pattern,
> replacement, .) then surely you don't want automatic insertion and if
> you surround it with { ... } then you are explicitly telling it not
> to.
>
> Assuming the existence of placeholders a possible simplification would
> be to NOT do automatic insertion if { ... } is used and to use it
> otherwise although personally having used it for some time I find the
> existing rule in magrittr generally does what you want.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] sequential chained operator thoughts

2020-12-07 Thread Avi Gross via R-devel
It has been very enlightening watching the discussion not only about the
existing and proposed variations of a data "pipe" operator in R but also
cognates in many other languages.

So I am throwing out a QUESTION that just asks if the pipeline as done is
pretty much what could also be done without the need for an operator using a
sort of one-time brac]keted  construct where you call a function with a
sequence of operations you want performed and just have it handle the
in-between parts.

I mean something like:

return_val <- do_chain_sequence( { initial_data,
function1(_VAL_);
function2(_VAL_, more_args);
function3(args, 2 * _VAL_, more_args);
...
function_n(_VAL_)
})

The above is not meant to be taken literally. I don't care if the symbol is
_VAL_ or you use semi-colon characters between statements. There are many
possible variants such as each step being in its own curly braces. The idea
is to hand over one or more unevaluated blocks of code. There are such
functions in use in R already.

And yes, it can be written with explicit BEFORE/AFTER clauses to handle
things but those are implementation details and I want to focus on a
concept.

The point is you can potentially write a function that given such a series
of arguments, delays evaluation of them until each is needed or used. About
all it might need to do is set the value of something like _VAL_ from the
first argument if present and then take the text of each subsequent argument
and run it while saving the result back into _VAL_ and at the end, return
the last _VAL_. Along the way, of course, the temporary values stored each
time in _VAL_ would disappear.

Is something like this any improvement over this done by the user:

Initial <- whatever
Temp1 <- function1(initial)
Temp2 <- function2(Temp1, ...)
rm(Temp1)
...

Well, maybe not much. But it does hide some details and allows you to insert
or delete steps without worrying about pesky details like variable names
being in the right sequence or not over-riding other things in your
namespace. It makes your intent clear.

Now obviously being evaluated inside a function is not necessarily the same
as remaining in the original environment so having something like this as a
built-in running in place might be a better idea.

I admit the details of how to get one piece at a time as some unevaluated
form and recognize clearly what each piece is takes some careful thought. If
you want to automatically throw in a first argument of _VAL_ after the first
parenthesis found or inserted in new parens if just the name of a function
was presented,  or other such manipulations as already seem to happen with
the Magritrr pipe where a period is the placeholder, that can be delicate
work and also fail for some lines of code.  There may be many reasons
various versions of this proposal can fail for some cases. But functionally,
it would be a way to specify in a linear fashion that a sequence of steps is
to be connected with data being passed along as it changes.

I can also imagine how this kind of method might allow twists like asking
for _VAL_$second or other changes such as sorted(_VAL_) or minmax(_VAL_)
that would shrink the sequence.

This general idea looks like something that some programming language may
already do in some form and functionally and is a bit like the pipe idea,
albeit with different overhead.

And do note many languages already support this in subtle ways. R has a
variable called ".Last.value" that always holds the result of the last
statement evaluated. If the template above is used properly, that alone
might work, albeit be a bit wordy. But it may be more transient in some
cases such as a multi-part statement where it ends up being reset within the
statement.

I am NOT asking for a new feature in R, or any language. I am just asking if
the various pipeline ideas  used could be done in a general way like I
describe as a sequence where the statements are chained as described and
intermediate results are transient. But, yes, some implementations might
require some changes to the language to be implemented properly and it might
not satisfy people used to thinking a certain way.

I end by saying that R is a language that only returns one (sometimes
complex) return value. Other languages allow multiple return values and
pipelines there might be hard to implement or have weird features that allow
various of the returns to be captured or even a more general graph of
command sequences  rather than just a linear pipeline. My thoughts here are
for R alone. And I shudder at what happens if you allow exceptions and other
kinds of breaks/returns out of such a sequential grouping in mid-stride. I
view most such additions and changes as needing careful thought to make sure
they have the functionality most people want, are as

Re: [Rd] New pipe operator

2020-12-07 Thread Gabor Grothendieck
On Mon, Dec 7, 2020 at 2:02 PM Kevin Ushey  wrote:
>
> IMHO the use of anonymous functions is a very clean solution to the
> placeholder problem, and the shorthand lambda syntax makes it much
> more ergonomic to use. Pipe implementations that crawl the RHS for
> usages of `.` are going to be more expensive than the alternatives. It

You wouldn't have to crawl the expression.  This does it at the syntax level.

  e <- quote( { gsub("x", "y", .) } )
  c(e[[1]], quote(. <- LHS), e[-1])

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R/S-PLUS] [EXTERNAL] Re: [External] anonymous functions

2020-12-07 Thread Bill Dunlap
One advantage of the new pipe operator over magrittr's is that the former
works with substitute().
> f <- function(x, xlab=deparse1(substitute(x))) paste(sep="", xlab, ": ",
paste(collapse=", ",x))
> 2^(1:4)  |> f()
[1] "2^(1:4): 2, 4, 8, 16"
> 2^(1:4)  %>% f()
[1] ".: 2, 4, 8, 16"

This is because the new one is at the parser level, so f() sees an ordinary
function call.
> dput(quote(2^(1:4)  |> f()))
f(2^(1:4))


On Mon, Dec 7, 2020 at 10:35 AM Therneau, Terry M., Ph.D. via R-devel <
r-devel@r-project.org> wrote:

> Luke,
>Mostly an aside.  I think that pipes are a good addition, and it is
> clear that you and
> other R-core thought through many of the details.   Congratulations on
> what appears to be
> solid work. I've used Unix since '79, so it is almost guarranteed that I
> like the basic
> idiom, and I expect to make use of it.  Users who think that pipes -- or
> any other code --
> is so clear that comments are superfluous is no reflection on R core, and
> also a bit of a
> hobby horse for me.
>
> I am a bit bemused by the flood of change suggestions, before people have
> had a chance to
> fully exercise the new code.   I'd suggest waiting several months, or a
> year, before major
> updates, straight up bugs excepted.   The same advice holds when moving
> into a new house.
> One  experience with the survival package has been that most new ideas
> have been
> implemented locally, and we run with them for half a year before
> submission to CRAN.  I've
> had a few "really great" modifications that, thankfully, were never
> inflicted on the rest
> of the R community.
>
> Terry T.
>
> On 12/7/20 11:26 AM, luke-tier...@uiowa.edu wrote:
> > I don't disagree in principle, but the reality is users want shortcuts
> > and as a result various packages, in particular tidyverse, have been
> > providing them. Mostly based on formulas, mostly with significant
> > issues since formulas weren't designed for this, and mostly
> > incompatible (tidyverse ones are compatible within tidyverse but not
> > with others). And of course none work in sapply or lapply. Providing a
> > shorthand in base may help to improve this. You don't have to use it
> > if you don't want to, and you can establish coding standards that
> > disallow it if you like.
> >
> > Best,
> >
> > luke
> >
> > On Mon, 7 Dec 2020, Therneau, Terry M., Ph.D. via R-devel wrote:
> >
> >> “The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may
> be helpful in
> >> making code containing simple function expressions more readable.”
> >>
> >> Color me unimpressed.
> >> Over the decades I've seen several "who can write the shortest code"
> threads: in
> >> Fortran, in C, in Splus, ...   The same old idea that "short" is a
> synonym for either
> >> elegant, readable, or efficient is now being recylced in the
> tidyverse.   The truth is
> >> that "short" is actually an antonym for all of these things, at least
> for anyone else
> >> reading the code; or for the original coder 30-60 minutes after the
> "clever" lines were
> >> written. Minimal use of the spacebar and/or the return key isn't
> usually held up as a
> >> goal, but creeps into many practiioner's code as well.
> >>
> >> People are excited by replacing "function(" with "\("? Really?   Are
> people typing code
> >> with their thumbs?
> >> I am ambivalent about pipes: I think it is a great concept, but too
> many of my
> >> colleagues think that using pipes = no need for any comments.
> >>
> >> As time goes on, I find my goal is to make my code less compact and
> more readable.
> >> Every bug fix or new feature in the survival package now adds more
> lines of comments or
> >> other documentation than lines of code.  If I have to puzzle out what a
> line does, what
> >> about the poor sod who inherits the maintainance?
> >>
> >>
> >>
> >
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Gabriel Becker
On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck 
wrote:

> On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch 
> wrote:
> > An advantage of the current implementation is that it's simple and easy
> > to understand.  Once you make it a user-modifiable binary operator,
> > things will go kind of nuts.
> >
> > For example, I doubt if there are many users of magrittr's pipe who
> > really understand its subtleties, e.g. the example in Luke's paper where
> > 1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And
> > I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
> > continue the fun.)
>
> The rule is not so complicated.  Automatic insertion is done unless
> you use dot in the top level function or if you surround it with
> {...}.  It really makes sense since if you use gsub(pattern,
> replacement, .) then surely you don't want automatic insertion and if
> you surround it with { ... } then you are explicitly telling it not
> to.
>
>
This is the point that I believe Duncan is trying to make (and I agree
with) though. Consider the question "after piping LHS into RHS, what is the
first argument in the resulting call?".

For the base pipe, the answer, completely unambiguously, is LHS. Full stop.
That is easy to understand.

For magrittr the answer is "Well, it depends, let me see your RHS
expression, is it wrapped in braces? If not, are you using the placeholder?
If you are using the placeholder, where/how are you using it?".

That is inherently much more complicated. Yes, you understand how the
magrittr pipe behaves, and yes you find it very convenient. Thats great,
but neither of those things equate to simplicity. They just mean that you,
a very experienced pipe user, carry around the cognitive load necessary to
have that understanding.

More concretely, the current base pipe  is extremely simple, all it does i


   1. Figure out RHS exprssion call
 1. If RHS is an anonymous function declaration, construct a call
 to it for a new RHS
  2. Insert LHS expression into first argument position of RHS call
  expression


Done. And (1) would be removed if anonymous functions required () after
them, which would be consistent, and even simpler, but kind of annoying. I
think it is a good compromise which is guaranteed to be safe because
anonymous functions are something the parser recognizes.  Either way, if
that was dropped, what |> does would be *entirely* trivial to understand
and explain. With a single sentence.

I had the equivalent pseudocode for the magrittr pipe written out here but
honestly felt like overkill that came across as mean, so I'll leave that as
an exercise to interested readers.

~G

> Assuming the existence of placeholders a possible simplification would
> be to NOT do automatic insertion if { ... } is used and to use it
> otherwise although personally having used it for some time I find the
> existing rule in magrittr generally does what you want.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Gabriel Becker
On Mon, Dec 7, 2020 at 11:05 AM Kevin Ushey  wrote:

> IMHO the use of anonymous functions is a very clean solution to the
> placeholder problem, and the shorthand lambda syntax makes it much
> more ergonomic to use. Pipe implementations that crawl the RHS for
> usages of `.` are going to be more expensive than the alternatives. It
> is nice that the `|>` operator is effectively the same as a regular R
> function call, and given the identical semantics could then also be
> reasoned about the same way regular R function calls are.
>

I agree. That said, one thing that maybe could be done, though I'm not
super convinced its needed, is make a "curry-stuffed pipe", where something
like

LHS |^pipearg^> RHS(arg1 = 5, arg3 = 7)

Would parse to

RHS(pipearg = LHS, arg1 = 5, arg3 = 7)


(Assuming we could get the parser to handle |^bla^> correctly)

For argument position issues would be sufficient. For more complicated
expressions, e.g., those that would use the placeholder multiple times or
inside compound expressions, requiring anonymous functions seems quite
reasonable to me. And honestly, while I kind of like it, I'm not sure if
that "stuffed pipe" expression (assuming we could get the parser to capture
it correctly) reads to me as nicer than the following, anyway.

LHS |> \(x) RHS(arg1 = 5, pipearg = x, arg3 = 7)

~G

>
> I also agree usages of the `.` placeholder can make the code more
> challenging to read, since understanding the behavior of a piped
> expression then requires scouring the RHS for usages of `.`, which can
> be challenging in dense code. Piping to an anonymous function makes
> the intent clear to the reader: the programmer is likely piping to an
> anonymous function because they care where the argument is used in the
> call, and so the reader of code should be aware of that.
>
> Best,
> Kevin
>
>
>
> On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck
>  wrote:
> >
> > On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch 
> wrote:
> > > An advantage of the current implementation is that it's simple and easy
> > > to understand.  Once you make it a user-modifiable binary operator,
> > > things will go kind of nuts.
> > >
> > > For example, I doubt if there are many users of magrittr's pipe who
> > > really understand its subtleties, e.g. the example in Luke's paper
> where
> > > 1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And
> > > I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
> > > continue the fun.)
> >
> > The rule is not so complicated.  Automatic insertion is done unless
> > you use dot in the top level function or if you surround it with
> > {...}.  It really makes sense since if you use gsub(pattern,
> > replacement, .) then surely you don't want automatic insertion and if
> > you surround it with { ... } then you are explicitly telling it not
> > to.
> >
> > Assuming the existence of placeholders a possible simplification would
> > be to NOT do automatic insertion if { ... } is used and to use it
> > otherwise although personally having used it for some time I find the
> > existing rule in magrittr generally does what you want.
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Dénes Tóth




On 12/7/20 11:09 PM, Gabriel Becker wrote:

On Mon, Dec 7, 2020 at 11:05 AM Kevin Ushey  wrote:


IMHO the use of anonymous functions is a very clean solution to the
placeholder problem, and the shorthand lambda syntax makes it much
more ergonomic to use. Pipe implementations that crawl the RHS for
usages of `.` are going to be more expensive than the alternatives. It
is nice that the `|>` operator is effectively the same as a regular R
function call, and given the identical semantics could then also be
reasoned about the same way regular R function calls are.



I agree. That said, one thing that maybe could be done, though I'm not
super convinced its needed, is make a "curry-stuffed pipe", where something
like

LHS |^pipearg^> RHS(arg1 = 5, arg3 = 7)

Would parse to

RHS(pipearg = LHS, arg1 = 5, arg3 = 7)



This gave me the idea that naming the arguments can be used to skip the 
placeholder issue:


"funny" |> sub(pattern = "f", replacement = "b")

Of course this breaks if the maintainer changes the order of the 
function arguments (which is not a nice practice but happens).


An option could be to allow for missing argument in the first position, 
but this might add further undesired complexity, so probably not worth 
the effort:


"funny" |> sub(x =, "f", "b")

So basically the parsing rule would be:

LHS |> RHS(arg=, ...) -> RHS(arg=LHS, ...)




(Assuming we could get the parser to handle |^bla^> correctly)

For argument position issues would be sufficient. For more complicated
expressions, e.g., those that would use the placeholder multiple times or
inside compound expressions, requiring anonymous functions seems quite
reasonable to me. And honestly, while I kind of like it, I'm not sure if
that "stuffed pipe" expression (assuming we could get the parser to capture
it correctly) reads to me as nicer than the following, anyway.

LHS |> \(x) RHS(arg1 = 5, pipearg = x, arg3 = 7)

~G



I also agree usages of the `.` placeholder can make the code more
challenging to read, since understanding the behavior of a piped
expression then requires scouring the RHS for usages of `.`, which can
be challenging in dense code. Piping to an anonymous function makes
the intent clear to the reader: the programmer is likely piping to an
anonymous function because they care where the argument is used in the
call, and so the reader of code should be aware of that.

Best,
Kevin



On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck
 wrote:


On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch 

wrote:

An advantage of the current implementation is that it's simple and easy
to understand.  Once you make it a user-modifiable binary operator,
things will go kind of nuts.

For example, I doubt if there are many users of magrittr's pipe who
really understand its subtleties, e.g. the example in Luke's paper

where

1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And
I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
continue the fun.)


The rule is not so complicated.  Automatic insertion is done unless
you use dot in the top level function or if you surround it with
{...}.  It really makes sense since if you use gsub(pattern,
replacement, .) then surely you don't want automatic insertion and if
you surround it with { ... } then you are explicitly telling it not
to.

Assuming the existence of placeholders a possible simplification would
be to NOT do automatic insertion if { ... } is used and to use it
otherwise although personally having used it for some time I find the
existing rule in magrittr generally does what you want.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] anonymous functions

2020-12-07 Thread Abby Spurdle
I mostly agree with your comments on anonymous functions.

However, I think the main problem is cryptic-ness, rather than succinct-ness.
The backslash is a relatively universal symbol within programming
languages with C-like (ALGOL-like?) syntax.
Where it denotes escape sequences within strings.

Using the leading character for escape sequences, to define functions,
is like using integers to define floating point numbers:

my.integer <- as.integer (2) * pi

Arguably, the motive is more to be ultra-succinct than cryptic.
But either way, we get syntax which is difficult to read, from a
mathematical and statistical perspective.


On Tue, Dec 8, 2020 at 6:04 AM Therneau, Terry M., Ph.D. via R-devel
 wrote:
>
> “The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be 
> helpful in making
> code containing simple function expressions more readable.”
>
> Color me unimpressed.
> Over the decades I've seen several "who can write the shortest code" threads: 
> in Fortran,
> in C, in Splus, ...   The same old idea that "short" is a synonym for either 
> elegant,
> readable, or efficient is now being recylced in the tidyverse.   The truth is 
> that "short"
> is actually an antonym for all of these things, at least for anyone else 
> reading the code;
> or for the original coder 30-60 minutes after the "clever" lines were 
> written.  Minimal
> use of the spacebar and/or the return key isn't usually held up as a goal, 
> but creeps into
> many practiioner's code as well.
>
> People are excited by replacing "function(" with "\("?  Really?   Are people 
> typing code
> with their thumbs?
> I am ambivalent about pipes: I think it is a great concept, but too many of 
> my colleagues
> think that using pipes = no need for any comments.
>
> As time goes on, I find my goal is to make my code less compact and more 
> readable.  Every
> bug fix or new feature in the survival package now adds more lines of 
> comments or other
> documentation than lines of code.  If I have to puzzle out what a line does, 
> what about
> the poor sod who inherits the maintainance?
>
>
> --
> Terry M Therneau, PhD
> Department of Health Science Research
> Mayo Clinic
> thern...@mayo.edu
>
> "TERR-ree THUR-noh"
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Gabriel Becker
Hi Denes,

On Mon, Dec 7, 2020 at 2:52 PM Dénes Tóth  wrote:

>
>
> This gave me the idea that naming the arguments can be used to skip the
> placeholder issue:
>
> "funny" |> sub(pattern = "f", replacement = "b")
>
> Of course this breaks if the maintainer changes the order of the
> function arguments (which is not a nice practice but happens).
>

This is true, but only if you are specifying all arguments that appear
before the one you want explicitly. In practice that may often be true? But
I don't really have a strong intuition about that as a non-pipe user. It
would require zero changes to the pipe by the R-core team though, so in
that sense it could be a solution in the cases it does work. It does make
the code subtler to read though, which is a pretty big downside, imho.


> An option could be to allow for missing argument in the first position,
> but this might add further undesired complexity, so probably not worth
> the effort:
>
> "funny" |> sub(x =, "f", "b")
>
> So basically the parsing rule would be:
>
> LHS |> RHS(arg=, ...) -> RHS(arg=LHS, ...)
>

The problem here is that its ambiguous, because myfun(x, y=, z) is
technically syntactically valid, so this would make code that parses now
into valid syntax change its meaning, and would prevent existing,
syntactically valid (Though hopefully quite rare) code in the pipe context.

~G


>
> >
> > (Assuming we could get the parser to handle |^bla^> correctly)
> >
> > For argument position issues would be sufficient. For more complicated
> > expressions, e.g., those that would use the placeholder multiple times or
> > inside compound expressions, requiring anonymous functions seems quite
> > reasonable to me. And honestly, while I kind of like it, I'm not sure if
> > that "stuffed pipe" expression (assuming we could get the parser to
> capture
> > it correctly) reads to me as nicer than the following, anyway.
> >
> > LHS |> \(x) RHS(arg1 = 5, pipearg = x, arg3 = 7)
> >
> > ~G
> >
> >>
> >> I also agree usages of the `.` placeholder can make the code more
> >> challenging to read, since understanding the behavior of a piped
> >> expression then requires scouring the RHS for usages of `.`, which can
> >> be challenging in dense code. Piping to an anonymous function makes
> >> the intent clear to the reader: the programmer is likely piping to an
> >> anonymous function because they care where the argument is used in the
> >> call, and so the reader of code should be aware of that.
> >>
> >> Best,
> >> Kevin
> >>
> >>
> >>
> >> On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck
> >>  wrote:
> >>>
> >>> On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch <
> murdoch.dun...@gmail.com>
> >> wrote:
>  An advantage of the current implementation is that it's simple and
> easy
>  to understand.  Once you make it a user-modifiable binary operator,
>  things will go kind of nuts.
> 
>  For example, I doubt if there are many users of magrittr's pipe who
>  really understand its subtleties, e.g. the example in Luke's paper
> >> where
>  1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2).
> (And
>  I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
>  continue the fun.)
> >>>
> >>> The rule is not so complicated.  Automatic insertion is done unless
> >>> you use dot in the top level function or if you surround it with
> >>> {...}.  It really makes sense since if you use gsub(pattern,
> >>> replacement, .) then surely you don't want automatic insertion and if
> >>> you surround it with { ... } then you are explicitly telling it not
> >>> to.
> >>>
> >>> Assuming the existence of placeholders a possible simplification would
> >>> be to NOT do automatic insertion if { ... } is used and to use it
> >>> otherwise although personally having used it for some time I find the
> >>> existing rule in magrittr generally does what you want.
> >>>
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] anonymous functions

2020-12-07 Thread Abby Spurdle
Sorry, I should replace "cryptic-ness" from my last post, with
"unnecessary cryptic-ness".
Sometimes short symbolic expressions are necessary.


P.S.
Often, I wish I could write: f (x) = x^2.
But that's replacement function syntax.


On Tue, Dec 8, 2020 at 11:56 AM Abby Spurdle  wrote:
>
> I mostly agree with your comments on anonymous functions.
>
> However, I think the main problem is cryptic-ness, rather than succinct-ness.
> The backslash is a relatively universal symbol within programming
> languages with C-like (ALGOL-like?) syntax.
> Where it denotes escape sequences within strings.
>
> Using the leading character for escape sequences, to define functions,
> is like using integers to define floating point numbers:
>
> my.integer <- as.integer (2) * pi
>
> Arguably, the motive is more to be ultra-succinct than cryptic.
> But either way, we get syntax which is difficult to read, from a
> mathematical and statistical perspective.
>
>
> On Tue, Dec 8, 2020 at 6:04 AM Therneau, Terry M., Ph.D. via R-devel
>  wrote:
> >
> > “The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be 
> > helpful in making
> > code containing simple function expressions more readable.”
> >
> > Color me unimpressed.
> > Over the decades I've seen several "who can write the shortest code" 
> > threads: in Fortran,
> > in C, in Splus, ...   The same old idea that "short" is a synonym for 
> > either elegant,
> > readable, or efficient is now being recylced in the tidyverse.   The truth 
> > is that "short"
> > is actually an antonym for all of these things, at least for anyone else 
> > reading the code;
> > or for the original coder 30-60 minutes after the "clever" lines were 
> > written.  Minimal
> > use of the spacebar and/or the return key isn't usually held up as a goal, 
> > but creeps into
> > many practiioner's code as well.
> >
> > People are excited by replacing "function(" with "\("?  Really?   Are 
> > people typing code
> > with their thumbs?
> > I am ambivalent about pipes: I think it is a great concept, but too many of 
> > my colleagues
> > think that using pipes = no need for any comments.
> >
> > As time goes on, I find my goal is to make my code less compact and more 
> > readable.  Every
> > bug fix or new feature in the survival package now adds more lines of 
> > comments or other
> > documentation than lines of code.  If I have to puzzle out what a line 
> > does, what about
> > the poor sod who inherits the maintainance?
> >
> >
> > --
> > Terry M Therneau, PhD
> > Department of Health Science Research
> > Mayo Clinic
> > thern...@mayo.edu
> >
> > "TERR-ree THUR-noh"
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] anonymous functions

2020-12-07 Thread David Hugh-Jones
I will stick my oar in here as a user to say that I find the \(x) syntax a bit 
line-noise-ish. 

David

> On 8 Dec 2020, at 00:05, Abby Spurdle  wrote:
> 
> Sorry, I should replace "cryptic-ness" from my last post, with
> "unnecessary cryptic-ness".
> Sometimes short symbolic expressions are necessary.
> 
> 
> P.S.
> Often, I wish I could write: f (x) = x^2.
> But that's replacement function syntax.
> 
> 
>> On Tue, Dec 8, 2020 at 11:56 AM Abby Spurdle  wrote:
>> 
>> I mostly agree with your comments on anonymous functions.
>> 
>> However, I think the main problem is cryptic-ness, rather than succinct-ness.
>> The backslash is a relatively universal symbol within programming
>> languages with C-like (ALGOL-like?) syntax.
>> Where it denotes escape sequences within strings.
>> 
>> Using the leading character for escape sequences, to define functions,
>> is like using integers to define floating point numbers:
>> 
>>my.integer <- as.integer (2) * pi
>> 
>> Arguably, the motive is more to be ultra-succinct than cryptic.
>> But either way, we get syntax which is difficult to read, from a
>> mathematical and statistical perspective.
>> 
>> 
>>> On Tue, Dec 8, 2020 at 6:04 AM Therneau, Terry M., Ph.D. via R-devel
>>>  wrote:
>>> 
>>> “The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be 
>>> helpful in making
>>> code containing simple function expressions more readable.”
>>> 
>>> Color me unimpressed.
>>> Over the decades I've seen several "who can write the shortest code" 
>>> threads: in Fortran,
>>> in C, in Splus, ...   The same old idea that "short" is a synonym for 
>>> either elegant,
>>> readable, or efficient is now being recylced in the tidyverse.   The truth 
>>> is that "short"
>>> is actually an antonym for all of these things, at least for anyone else 
>>> reading the code;
>>> or for the original coder 30-60 minutes after the "clever" lines were 
>>> written.  Minimal
>>> use of the spacebar and/or the return key isn't usually held up as a goal, 
>>> but creeps into
>>> many practiioner's code as well.
>>> 
>>> People are excited by replacing "function(" with "\("?  Really?   Are 
>>> people typing code
>>> with their thumbs?
>>> I am ambivalent about pipes: I think it is a great concept, but too many of 
>>> my colleagues
>>> think that using pipes = no need for any comments.
>>> 
>>> As time goes on, I find my goal is to make my code less compact and more 
>>> readable.  Every
>>> bug fix or new feature in the survival package now adds more lines of 
>>> comments or other
>>> documentation than lines of code.  If I have to puzzle out what a line 
>>> does, what about
>>> the poor sod who inherits the maintainance?
>>> 
>>> 
>>> --
>>> Terry M Therneau, PhD
>>> Department of Health Science Research
>>> Mayo Clinic
>>> thern...@mayo.edu
>>> 
>>> "TERR-ree THUR-noh"
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel