Date: Wed, 8 Dec 2021 09:56:50 -0500 From: Chet Ramey <chet.ra...@case.edu> Message-ID: <e5a57513-5a50-dde6-fe37-3d4f488ce...@case.edu>
Let's take this in smaller steps, and try and sort out one issue at at time. First, I think you're under a mistaken impression, which is revealed in the following paragraph. | The real question is whether you read a command substitution as a single | WORD, so that the lexer cannot return "the next newline token" until the | command substitution has been completed. There is absolutely nothing, anywhere, about "returning" the newline (token) (token in parens, as while we agree that's what it means, the standard doesn't currently say that either). All that is required is that the lexer encounter a newline (token). As soon as one is seen, here doc reading commences - which is all a lexical task. [ In some earlier messages, I might have said something about processing the here doc before returning the newline token, that was more a comment about how our system works - for us, whatever the lexer sees, it returns, regardless of what the grammar happens to be parsing at the time ... that has some issues, and makes other things much easier, so is something of a tradeoff - but I certainly never intended to imply that the newline token needed to be returned before a here doc can be read. That's just our implementation choice. ] If anything required the newline (token) to be returned to the grammar (in which case it would obviously have to be a newline token, not just a newline character, and that whole question would be moot) that would make here doc positioning a grammar issue, and it definitely is not. Further, I know bash (and any other shell that works correctly, ignoring how here docs are processed for this) must encounter the newline token in its lexer while initially scanning the command substitution (to include it in whatever word it forms part of). Consider the two following (leading sequences of) command substitutions: $( echo I need to see the contents of the case $book in order ) and $( echo I need to see the contents of the case $book in order ) aside from formatting for this e-mail (added white space, which eventually becomes irrelevant anyway) there is just a one character change between the first and the second - a single space char was changed to a newline. In the first of those, the final ')' shown terminates the command substitution. In the second, the ')' doesn't, the command substitution continues with more not shown here (because it is irrelevant to the point). In order properly to collect that command substitution, the lexer that is collecting it, **MUST** see, recognise, and process, the newline token. Then assuming that immediately before that command substitution (in each case) we had something like cat <<'EOF' $( one of the above... then in the second case, that newline token is the first one seen by the lexer after the here doc redirection is it not? (In the first case we haven't reached a newline token yet, that can be expected at some later point). | Command substitutions don't appear in the grammar at all, just like here- | documents. They're just words, and like other words, the characters they | contain don't affect other constructs. Sure. But the next misconception, or faulty assumption is revealed there. You're assuming that because, when you look at the page, the here doc in a case like cat <<EOF $( text here EOF command sub commands here ) is part of "the characters they contain". It isn't, here docs are eliminated by the lexer, just like \<newline> is eliminated - for the purpose of whatever construct was being built when encountered, they simply do not exist at all. That's how f\ o\ r still gets to be the reserved word "for" (assuming it appears in the appropriate place, and that that indentation is just to make the e-mail easier to read). Here docs have different rules, but the same effect. In the case above, the characters in the command substitution are the contents of this C-style quoted string: $'\n command sub commands here ' (assuming all the white space in this e-mail was actually there in the input, and isn't, in this case, just e-mail noise - adjust as appropriate). | I suppose it's precedence parsing: the command substitution has higher | precedence than here-documents. It isn't, because parsing, even pseudo-parsing, has nothing to do with it at all, it all happens in the lower level code which is reading the input, and scanning it character by character. All the upper level code does is to enable here doc processing when a here redirection operator has been encountered (queuing the here docs to be fetched in the order they were encountered, in case there is more than one << before a newline token appears). | > So, if one does | > | > $( cmd <<END ) [...] | Now put the text between $( and ) into a file and run it as a shell script. | Is it valid? Syntactically valid, certainly. That's what the standard requires (and all it requires). That it would not execute as it is is not material. That is no different than $( cmd <&5 ) Put the text of that (from between the $( and ) in a file, and run it as a shell script, and that won't work either, as in that script nothing has opened fd 5. In the earlier case no here doc has been supplied. Both are syntactically correct, which is determined by applying the rules of the grammar in production mode, and seeing if it is possible for the grammar to produce the text in question. In both cases, it is, clearly. This is another case where you appear to be reading words into the standard which are simply not there. | You're sure of your implementation's correctness. We don't agree. I believe I implement what the standard requires to be implemented, assuming that its "newline" is meant to be "newline token". There's no point continuing with the rest of your message, as everything turns on these points. If you can find anything in the standard which says something different that what I believe it says, please point me at it. kre