On Fri, 14 Jul 2023 at 09:47, Grisha Levit <grishale...@gmail.com> wrote:
> This patch implements the ksh93-style <<# redirection operator to enable > indentatable heredocs. On the whole I think this is great, and thankyou for working up the patch, but I would like to offer some comments and suggestions: Firstly, it's impossible to have the initial line of output indented. This might not seem important, but it would make certain kinds of code generation more awkward. Consider: cat <<# EndOfHead > <html> > <body> > <table> > EndOfHead > generator_thingy | > while IFS= read -r > do > cat <<# EndOfRow > <tr> > <td> > $REPLY > </td> > </tr> > EndOfRow > done > cat <<# EndOfTail > </table> <!-- this line won't be indented properly (and nor > will the following lines --> > </body> > </html> > EndOfTail > (If anyone is about to suggest that HTML isn't space-sensitive, then imagine this outputs YAML or Python instead.) One option that some other languages use is to find the terminator, and then use its indentation as the pattern to remove from the content lines. The problem of course is that it would take a double run over the content, but the benefit is that there'd only be one in-band signaling line instead of two. Secondly, the battle for 8-space tabs has been well and truly lost at this point, so hard-coding that constant feels like it's likely to be a source of errors. Thirdly, allowing lines to have less than the specified indentation seems likely to be a net loss - worse maintenance, and no visual improvement (except in the case where there is no fixed indentation and it's just "remove all"). In order to be tab-agnostic, I can see two reasonable options: 1. remove only an exact match for the sequence of whitespace characters that occurs in the indicator line 2. the same, but only accept tabs followed by spaces in the indicator line. (A side benefit would be that "ordinary" indented heredocs can use the same logic with T=INT_MAX and S=0.) To aid error reporting, I think the terminator token should be identified regardless of the combination of whitespace in its indentation, but if its leading whitespace was not tabs-then-spaces, or doesn't match the indicator line, then this should have the same consequences as "delimiter not found" only with a better error message. I wonder if this should be called "<<--" rather than "<<#" if it's not (quite) compatible with what ksh does? I will work up a modified version of the patch to implement this.