History of bash's support for self-modifying shell scripts?

2018-09-10 Thread Josh Triplett
While digging into the details of how bash reads shell scripts, I found
some indications that bash goes out of its way to support self-modifying
shell scripts. As far as I can tell, after reading and executing each
command, bash will seek backward and re-read the script from the
byte after the end of that command, rather than executing out of
buffered data previously read from the file. (For the purposes of this
logic, compound commands get run as a single unit, and this logic kicks
in after running the full compound command.)

I haven't found any indications that POSIX or similar require this, and
other shells like dash don't have the same behavior. I also haven't
found any details about this in the bash changelog or version control
history.

I'd like to get some more information on the history of this mechanism,
if possible. What led to bash adding support for this? What version of
bash first incorporated this, and has it changed over time?

(I don't want to use this mechanism myself; I'm asking because I'm
working on a project that needs to care about various shells'
compatibility requirements, and I wanted to find out more about this
unusual corner case.)

Thanks,
Josh Triplett



Re: History of bash's support for self-modifying shell scripts?

2018-09-10 Thread Chet Ramey
On 9/10/18 1:25 AM, Josh Triplett wrote:
> While digging into the details of how bash reads shell scripts, I found
> some indications that bash goes out of its way to support self-modifying
> shell scripts. As far as I can tell, after reading and executing each
> command, bash will seek backward and re-read the script from the
> byte after the end of that command, rather than executing out of
> buffered data previously read from the file. (For the purposes of this
> logic, compound commands get run as a single unit, and this logic kicks
> in after running the full compound command.)

It happens in only a few cases: 1) when forking a child to run a command;
2) when a redirection specifies the same file descriptor as bash is using
to read a script; and 3) when bash is reading a script from stdin and the
read builtin is used to read from that file descriptor.

The first case is probably the one you're interested in. It's been there
even since I wrote the buffered input code in 1992, and it's more about
making sure parent and child shells have a consistent view of the script
in case the child expects to read from it. It's about being careful, not
explicitly allowing self-modifying scripts.

Previous versions of the shell (through bash-1.12) used stdio, which has
behavior that varies across systems, especially across parent-child
boundaries and changing file descriptors due to redirection (which it can't
really handle at all).

POSIX says you have to do that anyway if the shell is reading from stdin:

"When the shell is using standard input and it invokes a command that also
uses standard input, the shell shall ensure that the standard input file
pointer points directly after the command it has read when the command
begins execution. It shall not read ahead in such a manner that any
characters intended to be read by the invoked command are consumed by the
shell (whether interpreted by the shell or not) or that characters that are
not read by the invoked command are not seen by the shell."

But it probably isn't needed in the general case. Why not take the code out
and see what happens with your testing?

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: History of bash's support for self-modifying shell scripts?

2018-09-10 Thread Josh Triplett
On Mon, Sep 10, 2018 at 04:50:29PM -0400, Chet Ramey wrote:
> On 9/10/18 1:25 AM, Josh Triplett wrote:
> > While digging into the details of how bash reads shell scripts, I found
> > some indications that bash goes out of its way to support self-modifying
> > shell scripts. As far as I can tell, after reading and executing each
> > command, bash will seek backward and re-read the script from the
> > byte after the end of that command, rather than executing out of
> > buffered data previously read from the file. (For the purposes of this
> > logic, compound commands get run as a single unit, and this logic kicks
> > in after running the full compound command.)
> 
> It happens in only a few cases: 1) when forking a child to run a command;
> 2) when a redirection specifies the same file descriptor as bash is using
> to read a script; and 3) when bash is reading a script from stdin and the
> read builtin is used to read from that file descriptor.
> 
> The first case is probably the one you're interested in. It's been there
> even since I wrote the buffered input code in 1992, and it's more about
> making sure parent and child shells have a consistent view of the script
> in case the child expects to read from it. It's about being careful, not
> explicitly allowing self-modifying scripts.

Interesting. I don't *think* the behavior I observed corresponds to one
of those cases; I observed it by just having a shell script that
carefully used `dd conv=notrunc of=$0 ...` to write code into the
current script after the current command.

> Previous versions of the shell (through bash-1.12) used stdio, which has
> behavior that varies across systems, especially across parent-child
> boundaries and changing file descriptors due to redirection (which it can't
> really handle at all).
> 
> POSIX says you have to do that anyway if the shell is reading from stdin:
> 
> "When the shell is using standard input and it invokes a command that also
> uses standard input, the shell shall ensure that the standard input file
> pointer points directly after the command it has read when the command
> begins execution. It shall not read ahead in such a manner that any
> characters intended to be read by the invoked command are consumed by the
> shell (whether interpreted by the shell or not) or that characters that are
> not read by the invoked command are not seen by the shell."

I did find that, but that only applies to stdin, not to shell scripts.

I'd certainly love to *only* do this for stdin.

> But it probably isn't needed in the general case. Why not take the code out
> and see what happens with your testing?

When dealing with something with the history and backward compatibility
of bash, I'm hesitant to take that approach with *anything* without
first checking with the experts who made it that way in the first place.
:)

Thanks for the history and details, I appreciate it!

- Josh Triplett



Re: History of bash's support for self-modifying shell scripts?

2018-09-10 Thread Ángel
On 2018-09-09 at 22:25 -0700, Josh Triplett wrote:
> (I don't want to use this mechanism myself; I'm asking because I'm
> working on a project that needs to care about various shells'
> compatibility requirements, and I wanted to find out more about this
> unusual corner case.)

The Thompson shell (up to Sixth Edition UNIX) supported a goto command
that was implemented as an external command(!) that moved the
filepointer to the label location (marked by the : command).

The PWB shell (Mashey shell) initially had if/switch/while, but they
used a goto to move the filepointer.

This tells us that self-modifying scripts would be possible on these
shells, and probably they also worked on th original bourne shell.

With these level of trickery, I wouldn't be surprised if some
self-modifying shell scripts were in use at the time (or were thought to
be) and bash was coded this way in order not to break them
(I can't really come up with a good use case for this feature, though). 

You may find these resources interesting
https://etsh.io/
https://www.in-ulm.de/~mascheck/


Regards




Re: History of bash's support for self-modifying shell scripts?

2018-09-10 Thread Robert Elz
Date:Tue, 11 Sep 2018 01:11:41 +0200
From:=?ISO-8859-1?Q?=C1ngel?= 
Message-ID:  <1536621101.1095.13.ca...@16bits.net>

  | The Thompson shell (up to Sixth Edition UNIX) supported a goto command
  | that was implemented as an external command(!) that moved the
  | filepointer to the label location (marked by the : command).

What is probably more important here is that in that shell, the script was 
standard input (always, whether it was a file as in "sh file" or just the
original stdin (often a tty) from "sh") and all foreground commands
inherited that standard input when invoked, unless redirected.
(Yes, just running "cat" without args was a way to send the rest of
the script to stdout, and then have the shell exit because it reached EOF.)

Any command could reposition the shell's stdin, which made it possible
to do all kinds of "interesting things".

On the other hand, it was impossible to pipe the output of some other
command into a script
ls ^ sh file
would not work (note the use of the v6 "pipe" character...) as the sh
running the script simply closed its stdin and replaced it with the file.
Redirecting stdin ("sh file < whatever") was also ineffective.

These days there is no syntax to refer to the script itself - it is impossible
to explicitly pass the script to a command to access - the only time any
of this is still relevant is when the shell's input is from stdin (sh -s,
whether the -s is explicit or implied from the (lack of) other args).

kre