Re: leaks fd for internal functions but not external command

Sam Liddicott Wed, 24 Jul 2019 06:08:31 -0700

Thanks for that thoughtful response.

* I understand that the design decision is to have variable file
descriptors to stay open after per-command redirection
* I understand that implementation constraints make it impossible to do
this uniformly (for external command redirection)
* I understand that it is difficult for the script author to detect which
case his code will be

I'm trying to make bash better and more usable.

The shell normally does a great job of hiding the difference between
internal and external commands, so even though it's very well documented,
most of the time the user doesn't need to be aware. This is great for the
user, and according to the principle of least surprise.

The syntactic sugar of having bash select a free fd (which necessary for
good composability of operations in complex script pipelines) is a great
benefit, especially when mixing with older pipelines having fixed numeric
fd.

You say that there are technical reasons why the syntactic sugar of also
keeping the fd open can't be implemented uniformly.

I wonder if this puts unnecessary cognitive burden on the user, leading to
reluctance to get the benefits, or to the introduction of latent bugs.

There is a case I explain below which can lead to a leaked fd being held on
to by subsequently invoked external processes. Of course it will
technically be the users fault but I'm looking at reducing the cognitive
burdens that make such a fault ultimately inevitable.

The cognitive burdens of leaving the fd open are:

1. It breaks the normal expectation that per-command redirects are limited
to the scope of the command.

A naked exec already works to hold open a variable fd in a wider scope if
that's what the scripter actually wants: exec {fd}>... ;

2. As syntactic sugar it moves, not removes, the boiler-plate burden

This naked exec (see above) saved by the syntactic sugar in the case where
the fd should remain open is offset by the naked exec now required in order
to close the fd for the traditional case that the fd should not left open
beyond the scope of the command.

3. The unmeetable cognitive burden is that in order to safely manage the
previous two item, the user needs to know if the command will be external
or internal or a function.

This makes it hard for the user depend on this feature, because it is not
possible to be sure at script author time whether a command is external. It
may have become a function, (due to export -f, source, etc) which affect
the execution environment.

4. The inevitable propagation of leaked fd's

The knowing user can remember to always use an identity wrapper function to
force treatment as external commands as internal functions in order to get
uniform behaviour, and also explicitly close the fd afterwards. (I hope
this doesn't break exec optimisations or signal propagation over a
different process tree topology, though I doubt it.

But other users may not know to close the fd which was never apparent (due
invoking an external command) but which becomes an fd leak when they
combine with other bash features (functions wrapping of external commands,
or export -f environment that does this unawares) and those leaked fd's may
then be inherited by other invoked external processes which may hold on to
them for some time.

This contrived example minimises the pipeline fd contortions in order to
show that when what was an external command then becomes an internal
command, it can as a consequence result in an fd leak to external processes
(bash+lsof+grep here) which may be long lived.

stty {x}>/tmp/log
bash -c 'lsof -p $$ | grep log ; :'
stty() {
  command stty $EXTRA_STTY "$@"
}
stty {x}>/tmp/log
bash -c 'lsof -p $$ | grep log ; :'

Leading to questions like: "Why does wrapping a one command in a function
cause a different background process to hang on to a private handle not
even used there?"

The future:

I recognise what you say about past design decisions, but for the future,
as it is hard to safely get the benefit of leaving the handle open for
variable per-command redefines, even for users who know about it, I wonder
if the syntactic sugar might be redefined to reduce the cognitive burden
and widen the benefit for the most valued variable fd's feature.

If the variable fd syntactic sugar were re-designed so that variable
handles were also limited to the scope the command, the same as for
external commands, the same as for numeric handles, then:
* the behaviour would be uniform,
* the cognitive burden would be reduced
* and there would be no behaviour dependent on the runtime environment
(export -f to wrap external commands).
* and no risk of unexpected or hard to control fd leaks to subsequent
external (long lived) commands

This would allow users to have full and safe benefit of bash-selected fd's,
which I am sure is what is intended.

I have done my best to be clear in a reasonable manner, but you are the
man, it is your project, we stand or fall by your decisions, not mine.

Sam

On Wed, 24 Jul 2019 at 01:20, Chet Ramey <chet.ra...@case.edu> wrote:

> On 7/23/19 5:15 PM, Sam Liddicott wrote:
> > I'm very surprised that you continue to insist that it should be a
> *design*
> > decision that it should be hard for a script writer to be able to tell
> if a
> > handle will be left open or not.
>
> What? The design decision is that a file descriptor opened with {var} will
> remain open after the command completes.
>
> > What could be the rationale for such a design decision?
>
> To make the redirection operator a little more useful than simple syntactic
> sugar.
>
> > The vague justification you provide "there are plenty of things that
> depend
> > on whether or not a command is builtin, or whether it's run in the parent
> > shell" is true but more relevant to an implementation constraint than a
> > design decision.
>
> An implementation constraint? That doesn't make any sense.
>
> The bash documentation makes it pretty clear which commands are builtin and
> the circumstances under which commands are run in child processes and which
> are run by the shell itself.
>
> > I'm confident that most of these things you hint at are too *avoid* the
> > scripter needing to be aware of the difference between internal and
> > external commands.
>
> Bash doesn't make it particularly obscure about which commands are builtin,
> and, as I said, the man page documents all of them.
>
> The builtin commands all provide functionality that can't be duplicated
> outside the shell itself, even the builtins that duplicate external
> commands (e.g., printf -v). Someone who writes shell scripts should be
> aware of what's builtin and what's not.
>
> But that's not the problem here.
>
> >
> > A design decision may well be to leave a variable handle open, but what
> > *design* decision would add the proviso that it not be an external
> command?
>
> This makes me believe that you have a fundamental misunderstanding about
> how the shell operates.
>
> The design decision is to leave the file descriptor open, as I said above.
> It's left open in all cases. The difference is that commands that are run
> from the file system perform redirections in the child process, and child
> processes cannot affect their parent's environment. That means, among other
> things, that a file descriptor that a child process opens does not affect
> the parent's descriptor set. That has nothing to do with the behavior of
> {var} per se; it's a consequence of the relationship between Unix
> processes.
>
> Chet
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>                  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/
>

Re: leaks fd for internal functions but not external command

Reply via email to