Regarding special treatment for |>, isn't it getting special treatment anyway, because it's implemented as a syntax transformation from x |> f(y) to f(x, y), rather than as an operator?
That said, the point about wanting a block of code submitted line-by-line to work the same as a block of code submitted all at once is a fair one. Maybe the better solution would be if there were a way to say "Submit the selected code as a single expression, ignoring line-breaks". Then I could run any number of lines with pipes at the start and no special character at the end, and have it treated as a single pipeline. I suppose that'd need to be a feature offered by the environment (RStudio's RNotebooks in my case). I could wrap my pipelines in parentheses (to make the "pipes at start of line" syntax valid R code), and then could use the hypothetical "submit selected code ignoring line-breaks" feature when running just the first part of the pipeline -- i.e., selecting full lines, but starting after the opening paren so as not to need to insert a closing paren. - Tim On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 09/12/2020 2:33 p.m., Timothy Goodman wrote: > > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the > > command in the Notebook environment I'm using) I certainly *would* > > expect R to treat it as a complete statement. > > > > But what I'm talking about is a different case, where I highlight a > > multi-line statement in my notebook: > > > > my_data_frame1 > > |> filter(some_conditions_1) > > > > and then press Ctrl+Enter. > > I don't think I'd like it if parsing changed between passing one line at > a time and passing a block of lines. I'd like to be able to highlight a > few lines and pass those, then type one, then highlight some more and > pass those: and have it act as though I just passed the whole combined > block, or typed everything one line at a time. > > > Or, I suppose the equivalent would be to run > > an R script containing those two lines of code, or to run a multi-line > > statement like that from the console (which in RStudio I can do by > > pressing Shift+Enter between the lines.) > > > > In those cases, R could either (1) Give an error message [the current > > behavior], or (2) understand that the first line is meant to be piped to > > the second. The second option would be significantly more useful, and > > is almost certainly what the user intended. > > > > (For what it's worth, there are some languages, such as Javascript, that > > consider the first token of the next line when determining if the > > previous line was complete. JavaScript's rules around this are overly > > complicated, but a rule like "a pipe following a line break is treated > > as continuing the previous line" would be much simpler. And while it > > might be objectionable to treat the operator %>% different from other > > operators, the addition of |>, which isn't truly an operator at all, > > seems like the right time to consider it.) > > I think this would be hard to implement with R's current parser, but > possible. I think it could be done by distinguishing between EOL > markers within a block of text and "end of block" marks. If it applied > only to the |> operator it would be *really* ugly. > > My strongest objection to it is the one at the top, though. If I have a > block of lines sitting in my editor that I just finished executing, with > the cursor pointing at the next line, I'd like to know that it didn't > matter whether the lines were passed one at a time, as a block, or some > combination of those. > > Duncan Murdoch > > > > > -Tim > > > > On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <murdoch.dun...@gmail.com > > <mailto:murdoch.dun...@gmail.com>> wrote: > > > > The requirement for operators at the end of the line comes from the > > interactive nature of R. If you type > > > > my_data_frame_1 > > > > how could R know that you are not done, and are planning to type the > > rest of the expression > > > > %>% filter(some_conditions_1) > > ... > > > > before it should consider the expression complete? The way languages > > like C do this is by requiring a statement terminator at the end. > You > > can also do it by wrapping the entire thing in parentheses (). > > > > However, be careful: Don't use braces: they don't work. And parens > > have the side effect of removing invisibility from the result (which > is > > a design flaw or bonus, depending on your point of view). So I > > actually > > wouldn't advise this workaround. > > > > Duncan Murdoch > > > > > > On 09/12/2020 12:45 a.m., Timothy Goodman wrote: > > > Hi, > > > > > > I'm a data scientist who routinely uses R in my day-to-day work, > > for tasks > > > such as cleaning and transforming data, exploratory data > > analysis, etc. > > > This includes frequent use of the pipe operator from the magrittr > > and dplyr > > > libraries, %>%. So, I was pleased to hear about the recent work > on a > > > native pipe operator, |>. > > > > > > This seems like a good time to bring up the main pain point I > > encounter > > > when using pipes in R, and some suggestions on what could be done > > about > > > it. The issue is that the pipe operator can't be placed at the > > start of a > > > line of code (except in parentheses). That's no different than > > any binary > > > operator in R, but I find it's a source of difficulty for the > > pipe because > > > of how pipes are often used. > > > > > > [I'm assuming here that my usage is fairly typical of a lot of > > users; at > > > any rate, I don't think I'm *too* unusual.] > > > > > > === Why this is a problem === > > > > > > It's very common (for me, and I suspect for many users of dplyr) > > to write > > > multi-step pipelines and put each step on its own line for > > readability. > > > Something like this: > > > > > > ### Example 1 ### > > > my_data_frame_1 %>% > > > filter(some_conditions_1) %>% > > > inner_join(my_data_frame_2, by = some_columns_1) %>% > > > group_by(some_columns_2) %>% > > > summarize(some_aggregate_functions_1) %>% > > > filter(some_conditions_2) %>% > > > left_join(my_data_frame_3, by = some_columns_3) %>% > > > group_by(some_columns_4) %>% > > > summarize(some_aggregate_functions_2) %>% > > > arrange(some_columns_5) > > > > > > [I guess some might consider this an overly long pipeline; for me > > it's > > > pretty typical. I *could* split it up by assigning intermediate > > results to > > > variables, but much of the value I get from the pipe is that it > > lets my > > > code communicate which results are temporary, and which will be > > used again > > > later. Assigning variables for single-use results would remove > that > > > expressiveness.] > > > > > > I would prefer (for reasons I'll explain) to be able to write the > > above > > > example like this, which isn't valid R: > > > > > > ### Example 2 (not valid R) ### > > > my_data_frame_1 > > > %>% filter(some_conditions_1) > > > %>% inner_join(my_data_frame_2, by = some_columns_1) > > > %>% group_by(some_columns_2) > > > %>% summarize(some_aggregate_functions_1) > > > %>% filter(some_conditions_2) > > > %>% left_join(my_data_frame_3, by = some_columns_3) > > > %>% group_by(some_columns_4) > > > %>% summarize(some_aggregate_functions_2) > > > %>% arrange(some_columns_5) > > > > > > One (minor) advantage is obvious: It lets you easily line up the > > pipes, > > > which means that you can see at a glance that the whole block is > > a single > > > pipeline, and you'd immediately notice if you inadvertently > > omitted a pipe, > > > which otherwise can lead to confusing output. [It's also > > aesthetically > > > pleasing, especially when %>% is replaced with |>, but that's > > subjective.] > > > > > > But the bigger issue happens when I want to re-run just *part* of > the > > > pipeline. I do this often when debugging: if the output of the > > pipeline > > > seems wrong, I re-run the first few steps and check the output, > then > > > include a little more and re-run again, etc., until I locate my > > mistake. > > > Working in an interactive notebook environment, this involves > > using the > > > cursor to select just the part of the code I want to re-run. > > > > > > It's fast and easy to select *entire* lines of code, but > > unfortunately with > > > the pipes placed at the end of the line I must instead select > > everything > > > *except* the last three characters of the line (the last two > > characters for > > > the new pipe). Then when I want to re-run the same partial > > pipeline with > > > the next line of code included, I can't just press SHIFT+Down to > > select it > > > as I otherwise would, but instead must move the cursor > > horizontally to a > > > position three characters before the end of *that* line (which is > > generally > > > different due to varying line lengths). And so forth each time I > > want to > > > include an additional line. > > > > > > Moreover, with the staggered positions of the pipes at the end of > > each > > > line, it's very easy to accidentally select the final pipe on a > > line, and > > > then sit there for a moment wondering if the environment has > stopped > > > responding before realizing it's just waiting for further input > > (i.e., for > > > the right-hand side). These small delays and disruptions add up > > over the > > > course of a day. > > > > > > This desire to select and re-run the first part of a pipeline is > > also the > > > reason why it doesn't suffice to achieve syntax like my "Example > > 2" by > > > wrapping the entire pipeline in parentheses. That's of no use if > > I want to > > > re-run a selection that doesn't include the final close-paren. > > > > > > === Possible Solutions === > > > > > > I can think of two, but maybe there are others. The first would > make > > > "Example 2" into valid code, and the second would allow you to > run a > > > selection that included a trailing pipe. > > > > > > Solution 1: Add a special case to how R is parsed, so if the > first > > > (non-whitespace) token after an end-line is a pipe, that pipe > > gets moved to > > > before the end-line. > > > - Argument for: This lets you write code like example 2, > which > > > addresses the pain point around re-running part of a pipeline, > > and has > > > advantages for readability. Also, since starting a line with a > pipe > > > operator is currently invalid, the change wouldn't break any > > working code. > > > - Argument against: It would make the behavior of %>% > > inconsistent with > > > that of other binary operators in R. (However, this objection > > might not > > > apply to the new pipe, |>, which I understand is being > > implemented as a > > > syntax transformation rather than a binary operator.) > > > > > > Solution 2: Ignore the pipe operator if it occurs as the final > > token of > > > the code being executed. > > > - Argument for: This would mean the user could select and > > re-run the > > > first few lines of a longer pipeline (selecting *entire* lines), > > avoiding > > > the difficulties described above. > > > - Argument against: This means that %>% would be valid even > > if it > > > occurred without a right-hand side, which is inconsistent with > other > > > operators in R. (But, as above, this objection might not apply > > to |>.) > > > Also, this solution still doesn't enable the syntax of "Example > > 2", with > > > its readability benefit. > > > > > > Thanks for reading this and considering it. > > > > > > - Tim Goodman > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > <https://stat.ethz.ch/mailman/listinfo/r-devel> > > > > > > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel