Re: Unnecessary bash warning message

Robert Elz Mon, 08 May 2023 16:50:49 -0700

    Date:        Mon, 8 May 2023 11:13:19 -0400
    From:        Chet Ramey <chet.ra...@case.edu>
    Message-ID:  <4c66cc58-27f7-4001-c831-8a2a23264...@case.edu>


  | I see I might need to update the error message to specify it's process
  | substitution.

You might want to (for this particular case anyway) want to change it
from saying "unterminated here-document" (which would imply that a here-doc
was being fetched, and EOF was reached instead of the terminator - which is
definitely worth warning about, perhaps even being an error) as that isn't
what this error is, the issue is that the here doc doesn't appear in the
environment at all ("missing" would be better than unterminated - it was
perfectly well terminated, or perhaps "misplaced" as the here doc text is
located anyway, just not where some people expect to find it).

  | The process substitution is in its own parsing environment,

The forthcoming POSIX draft kind of expects that - but I don't really
like it, and the existing standard, and all before it - and the original
Bourne shell - didn't treat here docs like that at all.   They were read
following the next newline (token, not one buried inside a string, or
whatever) in the source, and the current text just says "after the next
newline" (or words to that effect).   That is the next newline after the
<< operator (with appropriate allowances given to command lines with multiple
here doc operators, where the here docs follow, in order of the operators,
one after another, starting with the next newline).

Of course, in the Bourne shell there was only the `...` form of command
substitution, with its own truly bizarre set of parsing rules, which made
this more or less inevitable.    But somewhere along the line, someone
came to the delusion that command substitutions (which would include bash's
process substitutions) should simply be read as a "word" and then parsed
later.  It has since been conclusively demonstrated that that way lies
madness, and parsers cannot work like that, the command substitution text
needs to be parsed as it is read (whether or not the results of that parse
are saved, or just the text that created it, to be parsed again later), which
means that the here doc operator must necessarily be seen during that initial
parse - and the here doc text can (and IMO should) follow at the next newline,
wherever that happens to occur.   That text can be saved to be used again
when the command substitution text is parsed gain later if that is the way
the shell wants to do things (though why anyone would want to parse the text,
throw the results away, and then parse the same text again later is beyond me,
the only possible advantage is to allow some syntax errors to be ignored
during the initial parse, and only be reported if the second one actually
happens (code flow might mean that the command substitution is never
executed).   That only works occasionally - many syntax errors cause
parsers to get into a state from which they find it difficult to recover,
certainly not enough to be at all confident that the end of the command
substitution is correctly detected - which is the major aim of the first
parse) - and in any case, really just make broken code appear to be
correct, until sometime later when it does need to run, and then fails.
IMO it is much better to report the error immediately (during the first
parse) even if the code would never be executed, as in:
        if false; then x=$(nonsense |||| code; done); fi
(as best I can tell, current bash does always report that kind of nonsense
as a syntax error, and that's good, even though the assignment to x is
never run, so the command substitution is never executed).

Always simply reading here-docs after the next newline (wherever it
appears) allows here docs to be written as the OP requested, and many
other variations - the rule that the here doc text follows the next
newline (token) is trivial to understand, even if it makes parsing
theory purists vomit.   Here doc text is not part of the shell grammar,
it is noticed and removed from the input stream during lexical analysis,
and just because of that, should be treated entirely separately from the
parse state.

eg:
        cat <<HD1 | echo $( sed -e "$(cat <<HD2
        text for the first here doc
        comes here
        HD1
        sed commands for the second here doc
        come here
        HD2
                )"; echo ::: END )              # <<- that's the end of the 
cmdsubs
or:
        cat <<HD1 | echo $( sed -e "$(cat <<HD2)"; echo ::: END )
        text for the first here doc
        comes here
        HD1
        sed commands for the second here doc
        come here
        HD2

which is nicer (but bash doesn't parse as intended at all).

Requiring
        cat <<HD1 |
        text for the first here doc
        comes here
        HD1
                echo $( sed -e "$(cat <<HD2
        sed commands for the second here doc
        come here
        HD2
                )"; echo ::: END        )

which the "purists" seem to require, is IMO, just too difficult to
explain (besides being ugly).   That is, either that or

        cat <<HD1 | echo $( sed -e "$(cat <<HD2
        sed commands for the second here doc
        come here
        HD2
                )"; echo ::: END        )
        text for the first here doc
        comes here
        HD1

which is simply perverse, as the here doc text does not appear in
the same order as the here doc operators on the (lexical) command line.
(Note that the 3rd of those four ways to write this same thing is the
only one you can actually expect to work portably across multiple shells).

kre

ps: I'm aware that there are much simpler, and far more rational, ways
of writing that example, that's just a simple (really works) test case
to illustrate the issues, or works if meaningful sed commands, and meaningful
text for them to process, were put in the here docs.

Re: Unnecessary bash warning message

Reply via email to