Re: "here strings" and tmpfiles

Robert Elz Mon, 08 Apr 2019 21:21:44 -0700

    Date:        Mon, 08 Apr 2019 17:04:41 -0700
    From:        L A Walsh <b...@tlinx.org>
    Message-ID:  <5cabe199.9030...@tlinx.org>


  | On 4/8/2019 7:10 AM, Chet Ramey wrote:

  | > Pipes are objectively not the same as files. They
  | >
  | > 1. Do not have file semantics. For instance, they are not seekable.
  | >   
  | In the case of an object that is only meant to be read from,
  | I would argue, "that's fine".

For stdin (or stdout/stderr), processes in general should not assume that
seek will ever work, as terminals aren't seekable, nor are pipes, so if
some command is run as:
        cmd
or
        whatever | cmd

it cannot expect to seek stdin, it won't work, and it cannot really
tell the difference (well, it can, if it insists, but shouldn't)
between those and
        cmd << EOF              (or <<< if you insist)
or
        cmd < filename

so even if those happen to be seekable (the second is, the first
might be) cmd should never rely upon that.

  | Optionally, I would accept that
  | an implementation would support forward seeking as some equivalent
  | to having read the bytes.

I suppose one could make pipes do that, but no implementation I have
ever seen does, so I don't think you should hold your breath waiting for
that one to happen.

  | > 2. Have limited capacity. Writers will sleep when the pipe becomes full.
  | >   
  | So does a read-only disk, except writer doesn't flag the error to
  | the reader in the same way a broken pipe would.

Broken pipe wasn't Chet's point, rather with pipes it is possible to
deadlock - an obvious example where a shell needs to be careful is
in something like

        X=$( cat << FOO )

(where the here doc text is also there in whatever place the shell
in question requires) - and particularly if the shell happens to have
cat builtin, and is able to simplement simple command substitutions
without forking.

There the shell (if it uses a pipe for the here doc) would be writing
to the pipe, and immediately reading it again (as a built-in cat with
no options or args simply connects its stdin to stdout).

The point is that at some point the pipe buffer fills, and the writing
process stalls (at some point it must - there is no other way - the only
question is how much data gets buffered) until the reading process
consumes some, and makes space for more.  But where the reader and writer
are the same process, that never happens.

In general here docs (and here strings) are overused - it is always
possible to simply write a pipe instead

        printf %s\\n 'data' | cmd ...

instead of

        cmd ... <<'EOF'
        data
        EOF

(or using "data" if the here doc is <<EOF without quotes).

Or, if the command really wants a file

        printf %s\\n 'data' >/tmp/file ; cmd ... </tmp/file

Ignoring that and reverting to errors, aside from writes to dead
pipes generating SIGPIPE (which can be ignored if the process doesn't
want to deal with that) there is no real difference between a write
to a dead pipe and a write to a file opened read only (the errno value
differs, but that's it).   Or for that matter to a file opened read-write
or write only on a read only file system (the open will fail, then the
write will generate EBADF).

To the reader there is no real difference either way, it reads, and gets
EOF at some point.

  | The fact that the pipe does execution sequencing is often
  | a bonus,

Sometimes.   It does allow generation of here doc text to proceed
in parallel with its consumption, when that is the topic (which it
was here) rather than having to generate the full here doc file
first, and only then start the reader (which can matter when the
here doc contains a command substitution, which generates a LOT
of output).

  | since writing to a read-only tmp or reading from a non
  | existent fileshould be regarded as writing to a pipe with no
  | listeners (because no one will ever be able to read from that
  | 'tmp' file since it doesn't exist).

Sorry, that makes no sense.   The file cases have no valid fd
(opening a non-existant file fails, opening a file for writing
on a read only filesys fails).   A better analogy would be when
writing to a file fails when the filesystem becomes full, or the
user's quota is exceeded.

  | Using a file doesn't sequence -- the writer can still continue
  | execution pass the point of bash possibly flagging an internal
  | error for a non-existent tmp file (writable media) and the
  | reader won't get that the "pipe" (file) had no successful writer,
  | but instead get an EOF indication and continue, not knowing that
  | a fatal error had just occurred.

I doubt that is what happens.

  | I can't say that's wrong, though I would _like_ for the pipe to
  | try expanding its buffer via memory allocation, which no pipe
  | implementation, that I'm aware of, does.

Some do, actually - in fact, I think all do, they start off with
no memory allocated, and grab more as data is written.   But they
all have a limit on how much they will buffer for one pipe, otherwise
one stupid process could clog the system for everyone (having no
available memory/swap is a much worse situation than a filesystem
simply being full.)

  | However, that would
  | be code in the pipe implementation or an IO library on top
  | of some StdIO implementation using such.

Pipes are implemented in the kernel - userland does nothing different
at all (except the way they are created.)

  | W/pipes, there is the race condition of the reader not being able
  | to read in the condition where the writer has already gone away.

Huh?   That's nonsense.   It is perfectly normal for a reader
to read long after the writer has finished and exited.   Try this

        printf %s\\n hello | { sleep 5; cat; }

The printf writes its data and exits (quite quickly).  Sleep
doesn't touch stdin (the pipe at all).   The data is still there
for cat to read, long after printf has exited.

If this is suspicious because printf is a builtin, then use
any other command you like to generate the data - just make sure
it only generates a little, so it all fits in the pipe buffer
and doesn't cause the writing process to stall and wait for cat
to start reading.   Something like "uptime" (most probably not
built in anything!) or grep when only one line will be found
are suitable tests.

  | To avoid that i've had the parent send some message (signal,
  | semaphore, etc) to the child to indicate the parent has finished
  | reading what the child has written.  If the child's last write
  | included an "EOF", then the parent's msg to the child causes
  | the child to close the pipe and exit.

Sounds absurdly complicated, and probably indicates some other
bug exists.


  | "Various purposes"...  Ok, so how do I give that file name
  | to 'cp' in the next line and copy it somewhere?

You mean

        cp <(process) /tmp/foo

?

Not sure why anyone would want to do that rather than the
much simpler

        process >/tmp/foo

though.

  | It's not really a filename is it?

It is, it has to be to work.

  | It's a file descriptor -- a handle --

It is a filename that connects to an underlying file descriptor.
Or I assume that is how bash does it.

  | A Name-object doesn't have the data in it, but can be passed
  | around, "dataless', with its data stored elsewhere.

I have no idea what that means.

  | An open call can connect a program with the data stored for a given name.
  | Whereas what "< <()" creates is a file descriptor to be READ from.

You are still missing Chet's point.   There is no "< <()" operator.
That is two bash syntax elements being combined.  "<" (redirect stdin)
and "<()" (create a name to refer to the output from the command).

  | When I use '< <()', I've never wanted a filename.  I've wanted:

That's fine.   You're free to only use a subset of the functionality
if you desire.


  | The fact is, if you write to a file, instead of an OS pipe, both
  | the OS pipesize and the file are "implementation dependent". 
  | There is always some group of people who want /tmp to be of
  | type tmpfs (or memfs).  That's simply creating a pipe as large as
  | memory.

Pipes have no size limit.   That's one of the advantages.   You can
write petabytes (zetabytes) through one if you have the time to wait
for that to actually happen.   What is limited is how much the kernel
will store before stalling the sending process, until the reader consumes
data, leaving more space.   That process can go on forever.

  | Going to disk will create a pipe as large as the
  | free space on partition '/tmp'.

I assume "pipe" there is some confused way of saying "here doc".

  | On *my* system, tmp is on a partition of size 7.8G (w/4.7G free)
  | Running 'df' on tmpfs give me '79G'.

So, you have lots of ram / swap space, and no desire to limit how
much of that your tmpfs consumes.   I doubt that's a good idea, but
if it meets your needs, fine.

  | If bash uses /tmp, it can have a pipe of size 4.7G.  If
  | it uses memory, it would have pipe of 79G.

That's gibberish.

  |  If it uses
  | an OS pipe...that's OS dependent, no?  If the OS transparently
  | used memory to add dynamic space to a pipe, it would
  | also get 79G, or at least, some value like
  | /proc/sys/fs/pipe-max-size.

You clearly have no idea what a pipe is, or what that parameter
represents.

Thing of a garden hose, with a tap at one end (the writer) and a
spray nozzel with a trigger at the other.   The hose starts out
(let us assume) empty (really, filled with air, but never mind.)
You depress the trigger on the spray nozel, and nothing happens.
You can stand there for ever, still nothing happens - unless someone
turns the tap on.

If you turn the tap on without opening the spray nozel, the hose
fills with water (I assume the air just gets compressed, or escapes
somehow).  When the hose is full, no more water enters from the
tap - until you depress the lever on the sprayer and let the water
out.   While both are open, you can stand there for ever, and water
will just keep flowing.   Either end can stop temporarily and start
again, as many times as you like.

That is what a pipe is like.

Eventually the hose is disconneced from the tap, and after that there
will be no more water (that's EOF).

The only real difference from your average garden hose scenario, is
that pipes have a safety mechanism, if the sprayer nozel fails, or is
removed (or the hose breaks) the "tap" automatically shuts off.
All that is possible in the water the garden situation, but not usually
economically sensible.  Other similar setups have safeguards like
that though.

kre

Re: "here strings" and tmpfiles

Reply via email to