Hi Michael,

R returns the result of the last evaluated expression by default:
```
add_2 <- function(x) {
  x + 2L
}
```

is the same as and preferred over
```
add_2_return <- function(x) {
  out <- x + 2L
  return(out)
}
```

In the idiomatic use of R, one uses explicit `return` when one wants to break the control flow, e.g.:
```
add_2_if_number <- function(x) {
  ## early return if x is not numeric
  if (!is.numeric(x)) {
    return(x)
  }
  ## process otherwise (usually more complicated steps)
  ## note: this part will not be reached for non-numeric x
  x + 2L
}
```

So yes, you should drop the last "%>% `[`" altogether as `[.data.table` already returns the whole (modified) data.table when `:=` is used.

Side note:: If you use >=R4.1.0 and you do not use special features of `%>%`, try the native `|>` operator first (see `?pipeOp`). 1) You do not depend an a user-contributed package, and 2) it works at the parser level.

Cheers,
Denes

On 1/2/23 18:59, Michael Lachanski wrote:
Dénes, thank you for the guidance - which is well-taken.

Your side note raises an interesting question: I find the piping %>% operator readable. Is there any downside to it? Or is the side note meant to tell me to drop the last: "%>% `[`"?

Thank you,


==
Michael Lachanski
PhD Student in Demography and Sociology
MA Candidate in Statistics
University of Pennsylvania
mikel...@sas.upenn.edu <mailto:mikel...@sas.upenn.edu>


On Sat, Dec 31, 2022 at 9:22 AM Dénes Tóth <toth.de...@kogentum.hu <mailto:toth.de...@kogentum.hu>> wrote:

    Hi Michael,

    Note that you have to be very careful when using by-reference
    operations
    in data.table (see `?data.table::set`), especially in a functional
    programming approach. In your function, you avoid this problem by
    calling `data.table(A)` which makes a copy of A even if it is already a
    data.table. However, for large data.table-s, copying can be a very
    expensive operation (esp. in terms of RAM usage), which can be totally
    eliminated by using data.tables in the data.table-way (e.g., joining,
    grouping, and aggregating in the same step by performing these
    operations within `[`, see `?data.table`).

    So instead of blindly functionalizing all your code, try to be
    pragmatic. Functional programming is not about using pure functions in
    *every* part of your code base, because it is unfeasible in 99.9% of
    real-world problems. Even Haskell has `IO` and `do`; the point is that
    the  imperative and functional parts of the code are clearly separated
    and imperative components are (tried to be) as top-level as possible.

    So when using data.table, a good strategy is to use pure functions for
    performing within-data.table operations, e.g., `DT[, lapply(.SD, mean),
    .SDcols = is.numeric]`, and when these operations alter `DT` by
    reference, invoke the chains of these operations in "pure" wrappers -
    e.g., calling `A <- copy(A)` on the top and then modifying `A` directly.

    Cheers,
    Denes

    Side note: You do not need to use `DT[ , A:= shift(A, fill = NA, type =
    "lag", n = 1)] %>% `[`(return(DT))`. `[.data.table` returns the result
    (the modified DT) invisibly. If you want to let auto-print work, you
    can
    just use `DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)][]`.

    Note that this also means you usually you do not need to use magrittr's
    or base-R pipe when transforming data.table-s. You can do this instead:
    ```
    DT[
        ## filter rows where 'x' column equals "a"
        x == "a"
    ][
        ## calculate the mean of `z` for each gender and assign it to `y`
        , y := mean(z), by = "gender"
    ][
        ## do whatever you want
        ...
    ]
    ```


    On 12/31/22 13:39, Rui Barradas wrote:
     > Às 06:50 de 31/12/2022, Michael Lachanski escreveu:
     >> Hello,
     >>
     >> I am trying to make a habit of "functionalizing" all of my code as
     >> recommended by Hadley Wickham. I have found it surprisingly
    difficult
     >> to do
     >> so because several intermediate features from data.table break
    or give
     >> unexpected results using purrr and its data.table adaptation,
    tidytable.
     >> Here is the a minimal working example of what has stumped me most
     >> recently:
     >>
     >> ===
     >>
     >> library(data.table); library(tidytable)
     >>
     >> minimal_failing_function <- function(A){
     >>    DT <- data.table(A)
     >>    DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`
     >>    return(DT)}
     >> # works
     >> minimal_failing_function(c(1,2))
     >> # fails
     >> tidytable::pmap_dfr(.l = list(c(1,2)),
     >>                      .f = minimal_failing_function)
     >>
     >>
     >> ===
     >> These should ideally give the same output, but do not. This also
    fails
     >> using purrr::pmap_dfr rather than tidytable. I am using R 4.2.2
    and I
     >> am on
     >> Mac OS Ventura 13.1.
     >>
     >> Thank you for any help you can provide or general guidance.
     >>
     >>
     >> ==
     >> Michael Lachanski
     >> PhD Student in Demography and Sociology
     >> MA Candidate in Statistics
     >> University of Pennsylvania
     >> mikel...@sas.upenn.edu <mailto:mikel...@sas.upenn.edu>
     >>
     >>     [[alternative HTML version deleted]]
     >>
     >> ______________________________________________
     >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list
    -- To UNSUBSCRIBE and more, see
     >>
    
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_xXCvB6t$
 
<https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_xXCvB6t$>
     >> PLEASE do read the posting guide
     >>
    
https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_3rS2yQK$
 
<https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_3rS2yQK$>
     >> and provide commented, minimal, self-contained, reproducible code.
     > Hello,
     >
     > Use map_dfr instead of pmap_dfr.
     >
     >
     > library(data.table)
     > library(tidytable)
     >
     > minimal_failing_function <- function(A) {
     >    DT <- data.table(A)
     >    DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`
     >    return(DT)
     > }
     >
     > # works
     > tidytable::map_dfr(.x = list(c(1,2)),
     >                     .f = minimal_failing_function)
     > #> # A tidytable: 2 × 1
     > #>       A
     > #>   <dbl>
     > #> 1    NA
     > #> 2     1
     >
     >
     > Hope this helps,
     >
     > Rui Barradas
     >
     > ______________________________________________
     > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
    -- To UNSUBSCRIBE and more, see
     >
    
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_xXCvB6t$
 
<https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_xXCvB6t$>
     > PLEASE do read the posting guide
     >
    
https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_3rS2yQK$
 
<https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!IBzWLUs!VdfzdJ15GLScUok_hiqL3DvTJ20Ce8JMBkQ1NosBfyOvu68iuQkh9nsPZuUBbB9BtrsZBh86OjGyyj3lAB2g_3rS2yQK$>
     > and provide commented, minimal, self-contained, reproducible code.
     >


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to