I'm thrilled to hear it! Thank you! - Tim
P.S. I re-added the r-devel list, since Kevin's reply was sent just to me, but I thought there might be others interested in knowing about those work items. (I hope that's OK, email-etiquette-wise.) On Wed, Dec 9, 2020 at 1:10 PM Kevin Ushey <kevinus...@gmail.com> wrote: > You might be surprised to learn that the RStudio IDE engineers might > be receptive to such a feature request. :-) > > https://github.com/rstudio/rstudio/issues/8589 > https://github.com/rstudio/rstudio/issues/8590 > > (Spoiler alert: I am one of the RStudio IDE engineers, and I think > this would be worth doing.) > > Best, > Kevin > > On Wed, Dec 9, 2020 at 12:16 PM Timothy Goodman <timsgood...@gmail.com> > wrote: > > > > Since my larger concern is being able to conveniently select and re-run > part of a multiline pipeline, I don't think wrapping in parentheses will > help. I'd have to add a closing paren at the end of the selection, which > is no more convenient than having to highlight all but the last pipe. > (Admittedly, wrapping in parens would allow my preferred syntax of having > pipes at the start of the line, but I don't think that's worth the cost of > having to constantly move the trailing paren around.) > > > > My back-up plan if I fail to persuade you all is indeed to beg the > developers of RStudio to add an option to do the transformation I would > want when executing notebook code, but I'm anticipating the objection of "R > Notebooks shouldn't transform invalid R code into valid R code." I was > hoping "Let's make this new pipe |> work differently in a case that's > currently an error" would be an easier sell. > > > > Also, just to reiterate: Only one of my two suggestions really requires > caring about newlines. (That's my preferred solution, but I understand > it'd be the bigger change.) The other suggestion just amounts to ignoring > a final |> when code is submitted for execution. > > > > -Tim > > > > On Wed, Dec 9, 2020 at 11:58 AM Kevin Ushey <kevinus...@gmail.com> > wrote: > >> > >> I agree with Duncan that the right solution is to wrap the pipe > >> expression with parentheses. Having the parser treat newlines > >> differently based on whether the session is interactive, or on what > >> type of operator happens to follow a newline, feels like a pretty big > >> can of worms. > >> > >> I think this (or something similar) would accomplish what you want > >> while still retaining the nice aesthetics of the pipe expression, with > >> a minimal amount of syntax "noise": > >> > >> result <- ( > >> data > >> |> op1() > >> |> op2() > >> ) > >> > >> For interactive sessions where you wanted to execute only parts of the > >> pipeline at a time, I could see that being accomplished by the editor > >> -- it could transform the expression so that it could be handled by R, > >> either by hoisting the pipe operator(s) up a line, or by wrapping the > >> to-be-executed expression in parentheses for you. If such a style of > >> coding became popular enough, I'm sure the developers of such editors > >> would be interested and willing to support this ... > >> > >> Perhaps more importantly, it would be much easier to accomplish than a > >> change to the behavior of the R parser, and it would be work that > >> wouldn't have to be maintained by the R Core team. > >> > >> Best, > >> Kevin > >> > >> On Wed, Dec 9, 2020 at 11:34 AM Timothy Goodman <timsgood...@gmail.com> > wrote: > >> > > >> > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute > the > >> > command in the Notebook environment I'm using) I certainly *would* > expect R > >> > to treat it as a complete statement. > >> > > >> > But what I'm talking about is a different case, where I highlight a > >> > multi-line statement in my notebook: > >> > > >> > my_data_frame1 > >> > |> filter(some_conditions_1) > >> > > >> > and then press Ctrl+Enter. Or, I suppose the equivalent would be to > run an > >> > R script containing those two lines of code, or to run a multi-line > >> > statement like that from the console (which in RStudio I can do by > pressing > >> > Shift+Enter between the lines.) > >> > > >> > In those cases, R could either (1) Give an error message [the current > >> > behavior], or (2) understand that the first line is meant to be piped > to > >> > the second. The second option would be significantly more useful, > and is > >> > almost certainly what the user intended. > >> > > >> > (For what it's worth, there are some languages, such as Javascript, > that > >> > consider the first token of the next line when determining if the > previous > >> > line was complete. JavaScript's rules around this are overly > complicated, > >> > but a rule like "a pipe following a line break is treated as > continuing the > >> > previous line" would be much simpler. And while it might be > objectionable > >> > to treat the operator %>% different from other operators, the > addition of > >> > |>, which isn't truly an operator at all, seems like the right time to > >> > consider it.) > >> > > >> > -Tim > >> > > >> > On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch < > murdoch.dun...@gmail.com> > >> > wrote: > >> > > >> > > The requirement for operators at the end of the line comes from the > >> > > interactive nature of R. If you type > >> > > > >> > > my_data_frame_1 > >> > > > >> > > how could R know that you are not done, and are planning to type the > >> > > rest of the expression > >> > > > >> > > %>% filter(some_conditions_1) > >> > > ... > >> > > > >> > > before it should consider the expression complete? The way > languages > >> > > like C do this is by requiring a statement terminator at the end. > You > >> > > can also do it by wrapping the entire thing in parentheses (). > >> > > > >> > > However, be careful: Don't use braces: they don't work. And parens > >> > > have the side effect of removing invisibility from the result > (which is > >> > > a design flaw or bonus, depending on your point of view). So I > actually > >> > > wouldn't advise this workaround. > >> > > > >> > > Duncan Murdoch > >> > > > >> > > > >> > > On 09/12/2020 12:45 a.m., Timothy Goodman wrote: > >> > > > Hi, > >> > > > > >> > > > I'm a data scientist who routinely uses R in my day-to-day work, > for > >> > > tasks > >> > > > such as cleaning and transforming data, exploratory data > analysis, etc. > >> > > > This includes frequent use of the pipe operator from the magrittr > and > >> > > dplyr > >> > > > libraries, %>%. So, I was pleased to hear about the recent work > on a > >> > > > native pipe operator, |>. > >> > > > > >> > > > This seems like a good time to bring up the main pain point I > encounter > >> > > > when using pipes in R, and some suggestions on what could be done > about > >> > > > it. The issue is that the pipe operator can't be placed at the > start of > >> > > a > >> > > > line of code (except in parentheses). That's no different than > any > >> > > binary > >> > > > operator in R, but I find it's a source of difficulty for the pipe > >> > > because > >> > > > of how pipes are often used. > >> > > > > >> > > > [I'm assuming here that my usage is fairly typical of a lot of > users; at > >> > > > any rate, I don't think I'm *too* unusual.] > >> > > > > >> > > > === Why this is a problem === > >> > > > > >> > > > It's very common (for me, and I suspect for many users of dplyr) > to write > >> > > > multi-step pipelines and put each step on its own line for > readability. > >> > > > Something like this: > >> > > > > >> > > > ### Example 1 ### > >> > > > my_data_frame_1 %>% > >> > > > filter(some_conditions_1) %>% > >> > > > inner_join(my_data_frame_2, by = some_columns_1) %>% > >> > > > group_by(some_columns_2) %>% > >> > > > summarize(some_aggregate_functions_1) %>% > >> > > > filter(some_conditions_2) %>% > >> > > > left_join(my_data_frame_3, by = some_columns_3) %>% > >> > > > group_by(some_columns_4) %>% > >> > > > summarize(some_aggregate_functions_2) %>% > >> > > > arrange(some_columns_5) > >> > > > > >> > > > [I guess some might consider this an overly long pipeline; for me > it's > >> > > > pretty typical. I *could* split it up by assigning intermediate > results > >> > > to > >> > > > variables, but much of the value I get from the pipe is that it > lets my > >> > > > code communicate which results are temporary, and which will be > used > >> > > again > >> > > > later. Assigning variables for single-use results would remove > that > >> > > > expressiveness.] > >> > > > > >> > > > I would prefer (for reasons I'll explain) to be able to write the > above > >> > > > example like this, which isn't valid R: > >> > > > > >> > > > ### Example 2 (not valid R) ### > >> > > > my_data_frame_1 > >> > > > %>% filter(some_conditions_1) > >> > > > %>% inner_join(my_data_frame_2, by = some_columns_1) > >> > > > %>% group_by(some_columns_2) > >> > > > %>% summarize(some_aggregate_functions_1) > >> > > > %>% filter(some_conditions_2) > >> > > > %>% left_join(my_data_frame_3, by = some_columns_3) > >> > > > %>% group_by(some_columns_4) > >> > > > %>% summarize(some_aggregate_functions_2) > >> > > > %>% arrange(some_columns_5) > >> > > > > >> > > > One (minor) advantage is obvious: It lets you easily line up the > pipes, > >> > > > which means that you can see at a glance that the whole block is > a single > >> > > > pipeline, and you'd immediately notice if you inadvertently > omitted a > >> > > pipe, > >> > > > which otherwise can lead to confusing output. [It's also > aesthetically > >> > > > pleasing, especially when %>% is replaced with |>, but that's > >> > > subjective.] > >> > > > > >> > > > But the bigger issue happens when I want to re-run just *part* of > the > >> > > > pipeline. I do this often when debugging: if the output of the > pipeline > >> > > > seems wrong, I re-run the first few steps and check the output, > then > >> > > > include a little more and re-run again, etc., until I locate my > mistake. > >> > > > Working in an interactive notebook environment, this involves > using the > >> > > > cursor to select just the part of the code I want to re-run. > >> > > > > >> > > > It's fast and easy to select *entire* lines of code, but > unfortunately > >> > > with > >> > > > the pipes placed at the end of the line I must instead select > everything > >> > > > *except* the last three characters of the line (the last two > characters > >> > > for > >> > > > the new pipe). Then when I want to re-run the same partial > pipeline with > >> > > > the next line of code included, I can't just press SHIFT+Down to > select > >> > > it > >> > > > as I otherwise would, but instead must move the cursor > horizontally to a > >> > > > position three characters before the end of *that* line (which is > >> > > generally > >> > > > different due to varying line lengths). And so forth each time I > want to > >> > > > include an additional line. > >> > > > > >> > > > Moreover, with the staggered positions of the pipes at the end of > each > >> > > > line, it's very easy to accidentally select the final pipe on a > line, and > >> > > > then sit there for a moment wondering if the environment has > stopped > >> > > > responding before realizing it's just waiting for further input > (i.e., > >> > > for > >> > > > the right-hand side). These small delays and disruptions add up > over the > >> > > > course of a day. > >> > > > > >> > > > This desire to select and re-run the first part of a pipeline is > also the > >> > > > reason why it doesn't suffice to achieve syntax like my "Example > 2" by > >> > > > wrapping the entire pipeline in parentheses. That's of no use if > I want > >> > > to > >> > > > re-run a selection that doesn't include the final close-paren. > >> > > > > >> > > > === Possible Solutions === > >> > > > > >> > > > I can think of two, but maybe there are others. The first would > make > >> > > > "Example 2" into valid code, and the second would allow you to > run a > >> > > > selection that included a trailing pipe. > >> > > > > >> > > > Solution 1: Add a special case to how R is parsed, so if the > first > >> > > > (non-whitespace) token after an end-line is a pipe, that pipe > gets moved > >> > > to > >> > > > before the end-line. > >> > > > - Argument for: This lets you write code like example 2, > which > >> > > > addresses the pain point around re-running part of a pipeline, > and has > >> > > > advantages for readability. Also, since starting a line with a > pipe > >> > > > operator is currently invalid, the change wouldn't break any > working > >> > > code. > >> > > > - Argument against: It would make the behavior of %>% > inconsistent > >> > > with > >> > > > that of other binary operators in R. (However, this objection > might not > >> > > > apply to the new pipe, |>, which I understand is being > implemented as a > >> > > > syntax transformation rather than a binary operator.) > >> > > > > >> > > > Solution 2: Ignore the pipe operator if it occurs as the final > token > >> > > of > >> > > > the code being executed. > >> > > > - Argument for: This would mean the user could select and > re-run the > >> > > > first few lines of a longer pipeline (selecting *entire* lines), > avoiding > >> > > > the difficulties described above. > >> > > > - Argument against: This means that %>% would be valid even > if it > >> > > > occurred without a right-hand side, which is inconsistent with > other > >> > > > operators in R. (But, as above, this objection might not apply > to |>.) > >> > > > Also, this solution still doesn't enable the syntax of "Example > 2", with > >> > > > its readability benefit. > >> > > > > >> > > > Thanks for reading this and considering it. > >> > > > > >> > > > - Tim Goodman > >> > > > > >> > > > [[alternative HTML version deleted]] > >> > > > > >> > > > ______________________________________________ > >> > > > R-devel@r-project.org mailing list > >> > > > https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > >> > > > >> > > > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-devel@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel