Date: Mon, 8 May 2023 11:13:19 -0400 From: Chet Ramey <chet.ra...@case.edu> Message-ID: <4c66cc58-27f7-4001-c831-8a2a23264...@case.edu>
| I see I might need to update the error message to specify it's process | substitution. You might want to (for this particular case anyway) want to change it from saying "unterminated here-document" (which would imply that a here-doc was being fetched, and EOF was reached instead of the terminator - which is definitely worth warning about, perhaps even being an error) as that isn't what this error is, the issue is that the here doc doesn't appear in the environment at all ("missing" would be better than unterminated - it was perfectly well terminated, or perhaps "misplaced" as the here doc text is located anyway, just not where some people expect to find it). | The process substitution is in its own parsing environment, The forthcoming POSIX draft kind of expects that - but I don't really like it, and the existing standard, and all before it - and the original Bourne shell - didn't treat here docs like that at all. They were read following the next newline (token, not one buried inside a string, or whatever) in the source, and the current text just says "after the next newline" (or words to that effect). That is the next newline after the << operator (with appropriate allowances given to command lines with multiple here doc operators, where the here docs follow, in order of the operators, one after another, starting with the next newline). Of course, in the Bourne shell there was only the `...` form of command substitution, with its own truly bizarre set of parsing rules, which made this more or less inevitable. But somewhere along the line, someone came to the delusion that command substitutions (which would include bash's process substitutions) should simply be read as a "word" and then parsed later. It has since been conclusively demonstrated that that way lies madness, and parsers cannot work like that, the command substitution text needs to be parsed as it is read (whether or not the results of that parse are saved, or just the text that created it, to be parsed again later), which means that the here doc operator must necessarily be seen during that initial parse - and the here doc text can (and IMO should) follow at the next newline, wherever that happens to occur. That text can be saved to be used again when the command substitution text is parsed gain later if that is the way the shell wants to do things (though why anyone would want to parse the text, throw the results away, and then parse the same text again later is beyond me, the only possible advantage is to allow some syntax errors to be ignored during the initial parse, and only be reported if the second one actually happens (code flow might mean that the command substitution is never executed). That only works occasionally - many syntax errors cause parsers to get into a state from which they find it difficult to recover, certainly not enough to be at all confident that the end of the command substitution is correctly detected - which is the major aim of the first parse) - and in any case, really just make broken code appear to be correct, until sometime later when it does need to run, and then fails. IMO it is much better to report the error immediately (during the first parse) even if the code would never be executed, as in: if false; then x=$(nonsense |||| code; done); fi (as best I can tell, current bash does always report that kind of nonsense as a syntax error, and that's good, even though the assignment to x is never run, so the command substitution is never executed). Always simply reading here-docs after the next newline (wherever it appears) allows here docs to be written as the OP requested, and many other variations - the rule that the here doc text follows the next newline (token) is trivial to understand, even if it makes parsing theory purists vomit. Here doc text is not part of the shell grammar, it is noticed and removed from the input stream during lexical analysis, and just because of that, should be treated entirely separately from the parse state. eg: cat <<HD1 | echo $( sed -e "$(cat <<HD2 text for the first here doc comes here HD1 sed commands for the second here doc come here HD2 )"; echo ::: END ) # <<- that's the end of the cmdsubs or: cat <<HD1 | echo $( sed -e "$(cat <<HD2)"; echo ::: END ) text for the first here doc comes here HD1 sed commands for the second here doc come here HD2 which is nicer (but bash doesn't parse as intended at all). Requiring cat <<HD1 | text for the first here doc comes here HD1 echo $( sed -e "$(cat <<HD2 sed commands for the second here doc come here HD2 )"; echo ::: END ) which the "purists" seem to require, is IMO, just too difficult to explain (besides being ugly). That is, either that or cat <<HD1 | echo $( sed -e "$(cat <<HD2 sed commands for the second here doc come here HD2 )"; echo ::: END ) text for the first here doc comes here HD1 which is simply perverse, as the here doc text does not appear in the same order as the here doc operators on the (lexical) command line. (Note that the 3rd of those four ways to write this same thing is the only one you can actually expect to work portably across multiple shells). kre ps: I'm aware that there are much simpler, and far more rational, ways of writing that example, that's just a simple (really works) test case to illustrate the issues, or works if meaningful sed commands, and meaningful text for them to process, were put in the here docs.