feature request: new builtin `defer`, scope delayed eval

Cynthia Coan Thu, 06 Oct 2022 13:24:22 -0700

Hey all,

I've started working on this idea, and before getting too far I'd like
to get general feedback on the feature before going too far. I'd
specifically like to propose a new built-in called `defer` which acts
like `eval` however is not parsed/expanded/run until it's scope is
leaving. Hopefully "scope" is the correct word, I'm imagining it
running at the same time a local would go out of "scope" and be no
longer available (just before the locals are cleared, so locals can
still be used in expansion). The main purpose of defer is to help with
resource management, and more specifically cleanup.


Today cleaning up resources in scripts whether they be files,
virtual-machines/containers, or even global state can be challenging
for a variety of reasons. It can be very easy to leave extra
state/processes running that you may not mean to. Let's take a look
first at handling cleanup while "error mode", aka `set -e` is on
(we'll cover error mode being off later below, but we'll start with
error mode on. Not only because defer works better here, but also
because I think many scripts I write want error mode on as manually
checking every command for failure can be tedious.). Today there
exists four main ways of handling errors with error mode being on:

  1. Introduce another function that "wraps" the previous one, and is
capable of cleaning up resources. Then hoping no one calls the
internal one, maybe even by giving it a scary name like:
`__do_not_use_this_unless_you_want_to_do_cleanup_manually_which_you_better_internal_fn_name()`.
  2. Push responsibility onto the caller of the function, by having
users manually needing to call a cleanup function afterwards. Meaning
just calling: `my_function` is incorrect, and callers need to write:
`my_function || { cleanup_function; return 1; }`.
  3. Don't add complexity to the caller/wrap in a function, but push
complexity onto the author of the function itself by manually adding
`|| { cleanup; return 1; }` after every command in the function.
  4. Don't attempt to clean up the resource at all.

If #4 isn't a viable option, or it is but you'd just prefer not to do
it, you're left with three options that each either add significant
cognitive complexity, or the chance for misuse (or both!). This is
where defer comes in, solving the issue of "cleanup" without actually
introducing the chance for missing a cleanup through misuse. A very
over-simplified, contrived example is below:

  ```
  #!/usr/bin/env bash
  set -eo pipefail

  my_function() {
    local -r tmp_dir=$(mktemp -d)
    defer rm -r "${tmp_dir}"

    value=$(command-that-could-fail --save-state "${tmp_dir}/state")
    if [ "$value" = "success" ]; then
      could-fail-two --input "$(< ${tmp_dir}/state)"
      could-fail-three | pipe
      echo "commands succeeded"
    else
      echo "critical failure exiting entire process"
      exit 1
    fi
    return 0
  }
  ```

In this case no matter how this function exits where there's a problem
with a pipe, a command failing, exiting the entire process, or a
simple return out successfully; The resource is guaranteed to be
cleaned up, assuming rm doesn't fail -- if it did it would clobber the
return code to 1 in this case, even on a return of 0.

If your script is running with error mode off on purpose, the benefits
drastically fall down to just potential easier readability. Rather
then needing to create a cleanup function where validation that
cleanup is correct, you can co-locate cleanup with the creation of
each item. This could make it very easy to validate multi-step
cleanups. No longer do you have to open the cleanup function, and the
regular function side by side to validate correctness. Take for
example the error mode case I mentioned earlier in this paragraph:

  ```
  #!/usr/bin/env bash

  scoped_error_mode() {
    if ! echo -n "$SHELLOPTS" | grep 'errexit' >/dev/null 2>&1; then
      echo "error mode off, enabling for this function"
      set -e
      defer set +e
    fi
    if ! echo -n "$SHELLOPTS" | grep 'pipefail' >/dev/null 2>&1; then
      echo "pipefail off, enabling for this function"
      set -o pipefail
      defer set +o pipefail
    fi

    my_commands
    my_other_commands | piped-to
  }
  ```

Here not only can we scope normally global states to a single function
(allowing us to user error mode just where it might be useful, and not
everywhere), but as you can see the defer's are directly next to where
they are created which means we don't have to save to variables
whether or not we need to "turn things back off" again. This at least
for most people I think makes it significantly easier to read.

The help for the built-in I've been working on looks like:

```
defer: defer [-l] or defer [-d offset] or defer [arg ...]
    Execute arguments as a shell command when the current scope exists.

    Queue up a statement to be eval'd when a scope is left. Runs directly before
    locals in the same scope get cleared. Deferred statements are run in a last
    in first out order.

    Options:
      -d offset delete the defer entry at position OFFSET. Negative
                offsets count back from the end of the defer list

      -l        list all the active commands being deferred

    Exit Status:
    Returns success unless an invalid option is given or an error occurs.
```

Defer can be simulated roughly with functrace similar to how local can
be: 
https://gist.githubusercontent.com/Mythra/de8cdbfdb2b80496b9047b14dffefeb5/raw/91d6599ade2f575ea2d12c2b29e9a6cb829de744/defer-only.sh
. It doesn't have any of the listing/delete'ing functionality (but
there's no reason it couldn't). This was meant as a very quick rough
PoC so the thought could be played with by others. Since I imagine
adding a new feature akin to local will be quite the discussion.

Curious to hear your thoughts on this feature, and if there's a place
for it in Bash.

Thanks,
Cynthia

feature request: new builtin `defer`, scope delayed eval

Reply via email to