Re: "here strings" and tmpfiles

2019-04-09 Thread Robert Elz
Date:Mon, 8 Apr 2019 23:36:39 -0700
From:pepa65 
Message-ID:  

  | When in the past I proposed this syntax:
  |  cmd >>>var
  | the idea was to commit the output of a command into memory (in the form
  | of a variable), without requiring a pipe or file.

In general that cannot work, cmd and the shell are in separate
processes, even if some form of shared memory were used, cmd would
somehow have to be taught how to do that - but only to do it when
that particular form of output is being used (which since redirects
are handled by the shell, it normally knows nothing about).

Of course if cmd is built into the shell, then it would be easy,
but inventing new syntax which only works in very special cases is
not a good idea.

The idea is basically just to do

var=$( cmd )

right?   But without a fork.   That's something that can be done today,
no new syntax needed (bash might even do it sometimes, I don't know, the
FreeBSD shell does.)

When cmd is not built in, then the shell simply forks, after making
a pipe, and it works as you'd expect.   But when cmd is built in, and
executing it will do no (lasting) damage to the shell execution environment,
then there's no need to fork.   Since neither printf nor echo affect the
execution environment at all, they're perfect cases for that kind of
optimisation (this is also a frequent idiom, so can have real benefits.)

  | What is the technique you are referring to?

Exactly the above, if cmd is built in, its output goes into memory,
more or less what would happen (inside the shell, where all this is
happening) just as if its output were read from a pipe for a non-builtin,
but with no pipe (or other I/O) involved. Then that data is simply assigned.

The same technique works for stdin, in a case like cmd1 | cmd2
where both are builtin - cmd1 writes into a memory buffer, and cmd2
reads from that same thing (this needs care as the shell needs to
handle any scheduelling that's required, running cmd1 until it
ends or the buffer fills, then cmd2 until it has consumed all
available, then nack to cmd1 again...)   Whether this is worth the
effort is questionable.   The same can be done for here docs (or
strings) being read by built in commands, which was the actual case
I had in mind in the previous message.

Of course, there are often also easier techniques - a lot of the
examples being tossed around have easier (if perhaps more verbose)
ways to be written.   If you want to assign some known data (such
that you could put it in a here doc/string - which includes values
of variables, etc, of course) then rather than

read a b c <<< 'word1 word2 word3'

which is admittedly very compact, and looks cute, you can just do

a=word1 b=word2 c=word3

and the same when you're filling in an array, you just
need to explicitly add the subscripts.

I suspect that some of this is because bash's "readarray" is
slightly different than "read" or a simple assignment (this is
a guess based entirely upon bits and pieces I have picked up
from this list) - which is an example of why adding new special
case "stuff" is not a great idea in general, if it works just the
same as the existing stuff, then it isn't needed (perhaps just as
a frill for simplicity), if it doesn't, then it tends to interact
badly with everything else.


The point is that this kind of thing can be done just using optimisation
techniques of the current syntax - and a script that uses it will
work anywhere (just perhaps not as fast) - inventing new stuff to
try and make things work better is rarely a good idea, it just makes
the whole system a gigantic mess of ad-hoc special cases.

Of course, if there's a problem with the way that $( ) is defined
to work (like the trailing \n stripping, or whatever) that can be
addressed, either by some new syntax "this is just like that, except
that in the new one " or by some shell options that modify the
way that things work, which can be set by a script that knows what
it is doing (perhaps set inside the cmd substitution itself, so it
only affects that one, or outside to affect all of them.)   Of course,
any of that loses portability.

  | But when data gets passed between commands, it would be great if memory
  | could be used for that, for various good reasons. :-)

That's what a pipe is.   In general there needs to be some mechanism
that is general enough that any random command works with it, and that
means having the kernel involved to manage access to the data safely.

A file in a memory backed filesystem (a tmpfs or whatever) isn't that
much different.

If you have a very specialised set of commands that want to communicate
with each other, then you can write them to use shared memory, and have
them communicate that way - but there is little chance that all the
standard commands (or even any of them) are suddenly going to be modified
to make that work for general use.   And there is certainly no way to
make that happen by some ma

Re: "here strings" and tmpfiles

2019-04-09 Thread Greg Wooledge
On Tue, Apr 09, 2019 at 02:32:38PM +0700, Robert Elz wrote:
> The idea is basically just to do
> 
>   var=$( cmd )
> 
> right?   But without a fork.   That's something that can be done today,
> no new syntax needed (bash might even do it sometimes, I don't know, the
> FreeBSD shell does.)

wooledg:~$ strace -o log bash -c 'x=$(echo hi)'
...
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7f5166f16a10) = 19218
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {sa_handler=0x562d61250410, sa_mask=[], 
sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f5166f50940}, 
{sa_handler=0x562d61250410, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, 
sa_restorer=0x7f5166f50940}, 8) = 0
close(4)= 0
read(3, "hi\n", 128)= 3
read(3, "", 128)= 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=19218, si_uid=563, 
si_status=0, si_utime=0, si_stime=0} ---
...

Bash always forks for $() as far as I'm aware, which is why bash 3.1
introduced printf -v var.  That's the only way to get printf-formatted
output into a bash variable without using a temp file or a fork.



Re: "here strings" and tmpfiles

2019-04-09 Thread Chet Ramey
On 4/9/19 8:36 AM, Greg Wooledge wrote:

> Bash always forks for $() as far as I'm aware, which is why bash 3.1
> introduced printf -v var.  

It's not, but it was a nice side effect.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: "here strings" and tmpfiles

2019-04-09 Thread Jason A. Donenfeld
Since originally raising this issue with dkg (leading to this email
thread), I've only followed along from a bit of a distance. But it does
look like there's been some good progress: there's now a commit that
fills the pipe up to the OS's maximum pipe size, and then falls back to
the old (buggy, vulnerable, scary) behavior. Seems like there are several
problems with this approach:

  - Determining the maximum pipe size at build time doesn't make sense
for systems where such a thing is actually determined (and adjustable)
at runtime.

  - The security of this language construct is now OS and runtime-
configuration dependent. That means it's not that reliable, and so
we're basically back at advising square one: "don't use herestrings".

  - If user-supplied input is used in a herestring, the user now controls
whether the secure path or the insecure path is used.

A real solution for this issue involves getting rid of the temporary file
all together. Since we're talking about a bash string, it's already in
memory. Why not just fork() if the write() will block? A simple way would be
to always fork(). A fancy way would be to set NONBLOCK mode, see if it
returns EAGAIN, and only fork() if the write would block. Either way seem
basically fine, with the critical part being that the temporary file is
totally gone from the equation.

Thoughts on this?

Thanks,
Jason



Re: "here strings" and tmpfiles

2019-04-09 Thread konsolebox
On Wed, Mar 20, 2019 at 9:05 AM Robert Elz  wrote:
> Note: I am not suggesting bash should change - using files for here docs
> is the way they were originally implemented (in the Bourne sh) (though it
> had bugs, which could leave the files lying around in some cases).
>
> However, using files for here docs makes here docs unusable in a shell
> running in single user mode with no writable filesystems (whatever is
> mounted is read only, until after file system checks are finished).

Here docs and here strings are rarely used in pre-rw boot scripts, and
in my opinion should be avoided, but if it's necessary, an initramfs
should be used.  Some users can also mount /tmp as tmpfs earlier if
they know what they are doing.

Perhaps bash can also look at /dev/shm. It's a common tmpfs, but I
haven't checked if it's standard and what utility mounts it.  I don't
really use it.

Again to be clear, I'm against here * being used or viewed as seekable files.

-- konsolebox



Re: "here strings" and tmpfiles

2019-04-09 Thread konsolebox
On Wed, Mar 20, 2019 at 8:19 PM Greg Wooledge  wrote:
>
> Just like that one time L. Walsh tried to write a bash boot script that
> used <() to populate an array, and it failed because she was running
> it too early in the boot sequence, and /dev/fd/ wasn't available yet.

@Chet, Isn't bash supposed to use named pipes alternatively, and
dynamically?  Or does it just decide what to use based on the current
system?

-- konsolebox



Re: "here strings" and tmpfiles

2019-04-09 Thread Greg Wooledge
On Tue, Apr 09, 2019 at 10:10:44PM +0800, konsolebox wrote:
> @Chet, Isn't bash supposed to use named pipes alternatively, and
> dynamically?  Or does it just decide what to use based on the current
> system?

The second thing.  On platform X, bash uses named pipes.  On platform Y,
bash uses /dev/fd/.  It's decided at compile time.



Re: "here strings" and tmpfiles

2019-04-09 Thread konsolebox
On Mon, Apr 8, 2019 at 10:39 PM Greg Wooledge  wrote:
> That's incorrect in this context.  We're talking about boot scripts here,
> not interactive user shells.  In boot scripts, on every operating system
> I've ever used, the shell being used is either POSIX sh or Bourne sh.
>
> Everyone who writes boot scripts knows this.  Except, apparently, you.

Not everyone who aren't distro slaves.
https://github.com/OpenRC/openrc/commit/d64c9d205083ca82823f9f5ff178a5581f6c8b2a

A group of "popular" or historical distros don't define how a Linux
system should be built.

-- 
konsolebox



Re: "here strings" and tmpfiles

2019-04-09 Thread Chet Ramey
On 4/9/19 10:10 AM, konsolebox wrote:
> On Wed, Mar 20, 2019 at 8:19 PM Greg Wooledge  wrote:
>>
>> Just like that one time L. Walsh tried to write a bash boot script that
>> used <() to populate an array, and it failed because she was running
>> it too early in the boot sequence, and /dev/fd/ wasn't available yet.
> 
> @Chet, Isn't bash supposed to use named pipes alternatively, and
> dynamically?  

No. It's a build-time decision, and /dev/fd is preferred.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: "here strings" and tmpfiles

2019-04-09 Thread konsolebox
On Tue, Apr 9, 2019 at 10:28 PM Chet Ramey  wrote:
>
> On 4/9/19 10:10 AM, konsolebox wrote:
> > On Wed, Mar 20, 2019 at 8:19 PM Greg Wooledge  wrote:
> >>
> >> Just like that one time L. Walsh tried to write a bash boot script that
> >> used <() to populate an array, and it failed because she was running
> >> it too early in the boot sequence, and /dev/fd/ wasn't available yet.
> >
> > @Chet, Isn't bash supposed to use named pipes alternatively, and
> > dynamically?
>
> No. It's a build-time decision, and /dev/fd is preferred.

Why not make it load-time at least?  Not that I really care, since I
know when I can use process substitution in my scripts and when not.

-- 
konsolebox



Re: "here strings" and tmpfiles

2019-04-09 Thread Chet Ramey
On 4/9/19 11:25 AM, konsolebox wrote:
> On Tue, Apr 9, 2019 at 10:28 PM Chet Ramey  wrote:
>>
>> On 4/9/19 10:10 AM, konsolebox wrote:
>>> On Wed, Mar 20, 2019 at 8:19 PM Greg Wooledge  wrote:

 Just like that one time L. Walsh tried to write a bash boot script that
 used <() to populate an array, and it failed because she was running
 it too early in the boot sequence, and /dev/fd/ wasn't available yet.
>>>
>>> @Chet, Isn't bash supposed to use named pipes alternatively, and
>>> dynamically?
>>
>> No. It's a build-time decision, and /dev/fd is preferred.
> 
> Why not make it load-time at least?  

Maybe someday, but it's extremely low priority.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: "here strings" and tmpfiles

2019-04-09 Thread konsolebox
On Tue, Apr 9, 2019 at 11:28 PM Chet Ramey  wrote:
>
> On 4/9/19 11:25 AM, konsolebox wrote:
> > On Tue, Apr 9, 2019 at 10:28 PM Chet Ramey  wrote:
> >>
> >> On 4/9/19 10:10 AM, konsolebox wrote:
> >>> On Wed, Mar 20, 2019 at 8:19 PM Greg Wooledge  wrote:
> 
>  Just like that one time L. Walsh tried to write a bash boot script that
>  used <() to populate an array, and it failed because she was running
>  it too early in the boot sequence, and /dev/fd/ wasn't available yet.
> >>>
> >>> @Chet, Isn't bash supposed to use named pipes alternatively, and
> >>> dynamically?
> >>
> >> No. It's a build-time decision, and /dev/fd is preferred.
> >
> > Why not make it load-time at least?
>
> Maybe someday, but it's extremely low priority.

Yeah, and also perhaps lazy initialization is better. Using load-time
means it doesn't matter if /dev/fd gets fixed later through
initialization of udev, etc.

-- 
konsolebox



Re: "here strings" and tmpfiles

2019-04-09 Thread Eli Schwartz
On 4/9/19 10:25 AM, konsolebox wrote:
> On Mon, Apr 8, 2019 at 10:39 PM Greg Wooledge  wrote:
>> That's incorrect in this context.  We're talking about boot scripts here,
>> not interactive user shells.  In boot scripts, on every operating system
>> I've ever used, the shell being used is either POSIX sh or Bourne sh.
>>
>> Everyone who writes boot scripts knows this.  Except, apparently, you.
> 
> Not everyone who aren't distro slaves.
> https://github.com/OpenRC/openrc/commit/d64c9d205083ca82823f9f5ff178a5581f6c8b2a
> 
> A group of "popular" or historical distros don't define how a Linux
> system should be built.

Arch Linux has used bash as the default system /bin/sh for as long as I
know of, including since before the switch from sysvinit to systemd.
(Although I'm by no means the only person to replace it with a symlink
to dash.)

That being said, it seems like a rather odd place to configure and use a
heavyweight shell merely to allow third parties to include
downstream-specific bashisms. I think there is a great deal of wisdom in
the fact that the referenced issue (
https://github.com/OpenRC/openrc/issues/288 ) is not accepted (it is
still under discussion).

The commit itself has nothing to do with bash, and is just as useful for
changing openrc to use, for example, a statically compiled POSIX sh
shell that is less likely to break, while /bin/sh is a less
system-critical component -- or even a symlink to the heavyweight bash
that you don't want slowing down your boot process.

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User



signature.asc
Description: OpenPGP digital signature


Re: "here strings" and tmpfiles

2019-04-09 Thread L A Walsh



On 4/8/2019 9:19 PM, Robert Elz wrote:
>  
>   | Optionally, I would accept that
>   | an implementation would support forward seeking as some equivalent
>   | to having read the bytes.
>
> I suppose one could make pipes do that, but no implementation I have
> ever seen does, so I don't think you should hold your breath waiting for that 
> one to happen.
>   
Never seen it either, and was only stating that I could see
it being supported as one can skip input, however, it's
counter-intuitive that any mechanism seeking backwards would
make sense.
>   | > 2. Have limited capacity. Writers will sleep when the pipe becomes full.
>   | >   
>   | So does a read-only disk, except writer doesn't flag the error to
>   | the reader in the same way a broken pipe would.
>
> Broken pipe wasn't Chet's point, rather with pipes it is possible to
> deadlock - an obvious example where a shell needs to be careful is
> in something like
>
>   X=$( cat << FOO )
>   

I am aware of that, however, if a pipe implementation
*stops* on reaching a full condition from some 'tmp-storage-space'
and awaits for space to become available, a similar dynamic would
apply.  That's all. 

Example:  Suppose output from a program
was buffered to disk files 64k in size.  The reader
process would get input from those buffers on disk and
free the files as they are read.  If the writer ran out of
space, then sleeping and retrying the operation would make
since, as it would be expected that the reader would be
freeing blocks on disk as it read them.  It's not always
a safe assumption, but what else can it do?

[explanation of data piping elided -- seems to be similar
to using a tmp-space in a manner similar to my example].


> In general here docs (and here strings) are overused ...
>   
---
Often the choice is based on intent and a matter of
script formatting.

> ...
>
>   | since writing to a read-only tmp or reading from a non
>   | existent fileshould be regarded as writing to a pipe with no
>   | listeners (because no one will ever be able to read from that
>   | 'tmp' file since it doesn't exist).
>
> Sorry, that makes no sense.   The file cases have no valid fd
> (opening a non-existant file fails, opening a file for writing
> on a read only filesys fails).   A better analogy would be when
> writing to a file fails when the filesystem becomes full, or the
> user's quota is exceeded.
>   
Precisely, you are correct.  I was referring to an attempt of
mapping errors in using a file for tmp-space into types of errors
one would normally get from a real pipe.

That said, I could also imagine trying to open output to a
process on a process of a different security level on a
mandatory-access controlled OS where the writer doesn't
have permission to write or send information to the
'reader'.  If that happened, I would think it would have
equivalent error semantics as trying to open
a write-FD, on a RO file system.  This would especially be true
if the device's RO-state wasn't known about until attempting
to write to it (like an unwritable CD media in a CD-writer device).

>   | Using a file doesn't sequence -- the writer can still continue
>   | execution pass the point of bash possibly flagging an internal
>   | error for a non-existent tmp file (writable media) and the
>   | reader won't get that the "pipe" (file) had no successful writer,
>   | but instead get an EOF indication and continue, not knowing that
>   | a fatal error had just occurred.
>
> I doubt that is what happens.
>   

That is what appeared to happen in the post mentioned by Chet.
The boot process got a /dev/df/99 not found and continued on
seemingly as though though there had been no input.
>   | However, that would
>   | be code in the pipe implementation or an IO library on top
>   | of some StdIO implementation using such.
>
> Pipes are implemented in the kernel - userland does nothing different
> at all (except the way they are created.)
>   

They usually are.  That doesn't prevent a stdlib implementation
putting a wrapper around some "non-compliant" kernel call
to implement a different 'view' to the users of that lib.

>   | W/pipes, there is the race condition of the reader not being able
>   | to read in the condition where the writer has already gone away.
>
> Huh?   That's nonsense.   It is perfectly normal for a reader
> to read long after the writer has finished and exited.   Try this
>
>   printf %s\\n hello | { sleep 5; cat; }
>   
===
It may be normal in some cases, but:

https://superuser.com/questions/554855/how-can-i-fix-a-broken-pipe-error

I've encountered this error when I've use pipes. You may
not be seeing it due to buffer sizes (default buffer size
on linux it is 1M).
>   | "Various purposes"...  Ok, so how do I give that file name
>   | to 'cp' in the next line and copy it somewhere?
>
> You mean
>
>   cp <(process) /tmp/foo
>
> It is, it has to be to work.
>   
---
*red face*  I'd never tried to copy somet