Re: Unclosed quotes on heredoc mode

Robert Elz Thu, 09 Dec 2021 02:31:56 -0800

    Date:        Wed, 8 Dec 2021 09:56:50 -0500
    From:        Chet Ramey <chet.ra...@case.edu>
    Message-ID:  <e5a57513-5a50-dde6-fe37-3d4f488ce...@case.edu>


Let's take this in smaller steps, and try and sort out one issue
at at time.

First, I think you're under a mistaken impression, which is
revealed in the following paragraph.

  | The real question is whether you read a command substitution as a single
  | WORD, so that the lexer cannot return "the next newline token" until the
  | command substitution has been completed.

There is absolutely nothing, anywhere, about "returning" the newline
(token) (token in parens, as while we agree that's what it means, the
standard doesn't currently say that either).

All that is required is that the lexer encounter a newline (token).
As soon as one is seen, here doc reading commences - which is all a
lexical task.

      [ In some earlier messages, I might have said something about
        processing the here doc before returning the newline token, that
        was more a comment about how our system works - for us, whatever
        the lexer sees, it returns, regardless of what the grammar happens
        to be parsing at the time ... that has some issues, and makes other
        things much easier, so is something of a tradeoff - but I certainly
        never intended to imply that the newline token needed to be returned
        before a here doc can be read. That's just our implementation choice. ]

If anything required the newline (token) to be returned to the grammar
(in which case it would obviously have to be a newline token, not just
a newline character, and that whole question would be moot) that would
make here doc positioning a grammar issue, and it definitely is not.

Further, I know bash (and any other shell that works correctly, ignoring
how here docs are processed for this) must encounter the newline token
in its lexer while initially scanning the command substitution (to include
it in whatever word it forms part of).

Consider the two following (leading sequences of) command substitutions:

        $( echo I need to see the contents of the case $book in order )
and
        $( echo I need to see the contents of the
                case $book in order )

aside from formatting for this e-mail (added white space, which eventually
becomes irrelevant anyway) there is just a one character change between
the first and the second - a single space char was changed to a newline.

In the first of those, the final ')' shown terminates the command
substitution.   In the second, the ')' doesn't, the command substitution
continues with more not shown here (because it is irrelevant to the
point).

In order properly to collect that command substitution, the lexer that is
collecting it, **MUST** see, recognise, and process, the newline token.

Then assuming that immediately before that command substitution
(in each case) we had something like

        cat <<'EOF' $( one of the above...

then in the second case, that newline token is the first one seen by the
lexer after the here doc redirection is it not?

(In the first case we haven't reached a newline token yet, that can be
expected at some later point).

  | Command substitutions don't appear in the grammar at all, just like here-
  | documents. They're just words, and like other words, the characters they
  | contain don't affect other constructs.

Sure.

But the next misconception, or faulty assumption is revealed there.
You're assuming that because, when you look at the page, the here doc in a
case like

        cat <<EOF $(
text
here
EOF
                command sub commands here )

is part of "the characters they contain".   It isn't, here docs are
eliminated by the lexer, just like \<newline> is eliminated - for the
purpose of whatever construct was being built when encountered, they
simply do not exist at all.

That's how

        f\
        o\
        r

still gets to be the reserved word "for" (assuming it appears in the
appropriate place, and that that indentation is just to make the e-mail
easier to read).   Here docs have different rules, but the same effect.

In the case above, the characters in the command substitution are
the contents of this C-style quoted string:

        $'\n            command sub commands here '

(assuming all the white space in this e-mail was actually there in
the input, and isn't, in this case, just e-mail noise - adjust as
appropriate).

  | I suppose it's precedence parsing: the command substitution has higher
  | precedence than here-documents.

It isn't, because parsing, even pseudo-parsing, has nothing to do with
it at all, it all happens in the lower level code which is reading the
input, and scanning it character by character.   All the upper level code
does is to enable here doc processing when a here redirection operator
has been encountered (queuing the here docs to be fetched in the order
they were encountered, in case there is more than one << before a newline
token appears).

  | > So, if one does
  | > 
  | >   $( cmd <<END )

[...]

  | Now put the text between $( and ) into a file and run it as a shell script.
  | Is it valid?

Syntactically valid, certainly.   That's what the standard requires
(and all it requires).   That it would not execute as it is is not
material.

That is no different than

        $( cmd <&5 )

Put the text of that (from between the $( and ) in a file, and run it
as a shell script, and that won't work either, as in that script nothing
has opened fd 5.   In the earlier case no here doc has been supplied.

Both are syntactically correct, which is determined by applying the
rules of the grammar in production mode, and seeing if it is possible
for the grammar to produce the text in question.  In both cases, it
is, clearly.

This is another case where you appear to be reading words into the standard
which are simply not there.

  | You're sure of your implementation's correctness. We don't agree.

I believe I implement what the standard requires to be implemented,
assuming that its "newline" is meant to be "newline token".

There's no point continuing with the rest of your message, as everything
turns on these points.   If you can find anything in the standard which
says something different that what I believe it says, please point me at
it.

kre

Re: Unclosed quotes on heredoc mode

Reply via email to