Date: Mon, 08 Apr 2019 17:04:41 -0700 From: L A Walsh <b...@tlinx.org> Message-ID: <5cabe199.9030...@tlinx.org>
| On 4/8/2019 7:10 AM, Chet Ramey wrote: | > Pipes are objectively not the same as files. They | > | > 1. Do not have file semantics. For instance, they are not seekable. | > | In the case of an object that is only meant to be read from, | I would argue, "that's fine". For stdin (or stdout/stderr), processes in general should not assume that seek will ever work, as terminals aren't seekable, nor are pipes, so if some command is run as: cmd or whatever | cmd it cannot expect to seek stdin, it won't work, and it cannot really tell the difference (well, it can, if it insists, but shouldn't) between those and cmd << EOF (or <<< if you insist) or cmd < filename so even if those happen to be seekable (the second is, the first might be) cmd should never rely upon that. | Optionally, I would accept that | an implementation would support forward seeking as some equivalent | to having read the bytes. I suppose one could make pipes do that, but no implementation I have ever seen does, so I don't think you should hold your breath waiting for that one to happen. | > 2. Have limited capacity. Writers will sleep when the pipe becomes full. | > | So does a read-only disk, except writer doesn't flag the error to | the reader in the same way a broken pipe would. Broken pipe wasn't Chet's point, rather with pipes it is possible to deadlock - an obvious example where a shell needs to be careful is in something like X=$( cat << FOO ) (where the here doc text is also there in whatever place the shell in question requires) - and particularly if the shell happens to have cat builtin, and is able to simplement simple command substitutions without forking. There the shell (if it uses a pipe for the here doc) would be writing to the pipe, and immediately reading it again (as a built-in cat with no options or args simply connects its stdin to stdout). The point is that at some point the pipe buffer fills, and the writing process stalls (at some point it must - there is no other way - the only question is how much data gets buffered) until the reading process consumes some, and makes space for more. But where the reader and writer are the same process, that never happens. In general here docs (and here strings) are overused - it is always possible to simply write a pipe instead printf %s\\n 'data' | cmd ... instead of cmd ... <<'EOF' data EOF (or using "data" if the here doc is <<EOF without quotes). Or, if the command really wants a file printf %s\\n 'data' >/tmp/file ; cmd ... </tmp/file Ignoring that and reverting to errors, aside from writes to dead pipes generating SIGPIPE (which can be ignored if the process doesn't want to deal with that) there is no real difference between a write to a dead pipe and a write to a file opened read only (the errno value differs, but that's it). Or for that matter to a file opened read-write or write only on a read only file system (the open will fail, then the write will generate EBADF). To the reader there is no real difference either way, it reads, and gets EOF at some point. | The fact that the pipe does execution sequencing is often | a bonus, Sometimes. It does allow generation of here doc text to proceed in parallel with its consumption, when that is the topic (which it was here) rather than having to generate the full here doc file first, and only then start the reader (which can matter when the here doc contains a command substitution, which generates a LOT of output). | since writing to a read-only tmp or reading from a non | existent fileshould be regarded as writing to a pipe with no | listeners (because no one will ever be able to read from that | 'tmp' file since it doesn't exist). Sorry, that makes no sense. The file cases have no valid fd (opening a non-existant file fails, opening a file for writing on a read only filesys fails). A better analogy would be when writing to a file fails when the filesystem becomes full, or the user's quota is exceeded. | Using a file doesn't sequence -- the writer can still continue | execution pass the point of bash possibly flagging an internal | error for a non-existent tmp file (writable media) and the | reader won't get that the "pipe" (file) had no successful writer, | but instead get an EOF indication and continue, not knowing that | a fatal error had just occurred. I doubt that is what happens. | I can't say that's wrong, though I would _like_ for the pipe to | try expanding its buffer via memory allocation, which no pipe | implementation, that I'm aware of, does. Some do, actually - in fact, I think all do, they start off with no memory allocated, and grab more as data is written. But they all have a limit on how much they will buffer for one pipe, otherwise one stupid process could clog the system for everyone (having no available memory/swap is a much worse situation than a filesystem simply being full.) | However, that would | be code in the pipe implementation or an IO library on top | of some StdIO implementation using such. Pipes are implemented in the kernel - userland does nothing different at all (except the way they are created.) | W/pipes, there is the race condition of the reader not being able | to read in the condition where the writer has already gone away. Huh? That's nonsense. It is perfectly normal for a reader to read long after the writer has finished and exited. Try this printf %s\\n hello | { sleep 5; cat; } The printf writes its data and exits (quite quickly). Sleep doesn't touch stdin (the pipe at all). The data is still there for cat to read, long after printf has exited. If this is suspicious because printf is a builtin, then use any other command you like to generate the data - just make sure it only generates a little, so it all fits in the pipe buffer and doesn't cause the writing process to stall and wait for cat to start reading. Something like "uptime" (most probably not built in anything!) or grep when only one line will be found are suitable tests. | To avoid that i've had the parent send some message (signal, | semaphore, etc) to the child to indicate the parent has finished | reading what the child has written. If the child's last write | included an "EOF", then the parent's msg to the child causes | the child to close the pipe and exit. Sounds absurdly complicated, and probably indicates some other bug exists. | "Various purposes"... Ok, so how do I give that file name | to 'cp' in the next line and copy it somewhere? You mean cp <(process) /tmp/foo ? Not sure why anyone would want to do that rather than the much simpler process >/tmp/foo though. | It's not really a filename is it? It is, it has to be to work. | It's a file descriptor -- a handle -- It is a filename that connects to an underlying file descriptor. Or I assume that is how bash does it. | A Name-object doesn't have the data in it, but can be passed | around, "dataless', with its data stored elsewhere. I have no idea what that means. | An open call can connect a program with the data stored for a given name. | Whereas what "< <()" creates is a file descriptor to be READ from. You are still missing Chet's point. There is no "< <()" operator. That is two bash syntax elements being combined. "<" (redirect stdin) and "<()" (create a name to refer to the output from the command). | When I use '< <()', I've never wanted a filename. I've wanted: That's fine. You're free to only use a subset of the functionality if you desire. | The fact is, if you write to a file, instead of an OS pipe, both | the OS pipesize and the file are "implementation dependent". | There is always some group of people who want /tmp to be of | type tmpfs (or memfs). That's simply creating a pipe as large as | memory. Pipes have no size limit. That's one of the advantages. You can write petabytes (zetabytes) through one if you have the time to wait for that to actually happen. What is limited is how much the kernel will store before stalling the sending process, until the reader consumes data, leaving more space. That process can go on forever. | Going to disk will create a pipe as large as the | free space on partition '/tmp'. I assume "pipe" there is some confused way of saying "here doc". | On *my* system, tmp is on a partition of size 7.8G (w/4.7G free) | Running 'df' on tmpfs give me '79G'. So, you have lots of ram / swap space, and no desire to limit how much of that your tmpfs consumes. I doubt that's a good idea, but if it meets your needs, fine. | If bash uses /tmp, it can have a pipe of size 4.7G. If | it uses memory, it would have pipe of 79G. That's gibberish. | If it uses | an OS pipe...that's OS dependent, no? If the OS transparently | used memory to add dynamic space to a pipe, it would | also get 79G, or at least, some value like | /proc/sys/fs/pipe-max-size. You clearly have no idea what a pipe is, or what that parameter represents. Thing of a garden hose, with a tap at one end (the writer) and a spray nozzel with a trigger at the other. The hose starts out (let us assume) empty (really, filled with air, but never mind.) You depress the trigger on the spray nozel, and nothing happens. You can stand there for ever, still nothing happens - unless someone turns the tap on. If you turn the tap on without opening the spray nozel, the hose fills with water (I assume the air just gets compressed, or escapes somehow). When the hose is full, no more water enters from the tap - until you depress the lever on the sprayer and let the water out. While both are open, you can stand there for ever, and water will just keep flowing. Either end can stop temporarily and start again, as many times as you like. That is what a pipe is like. Eventually the hose is disconneced from the tap, and after that there will be no more water (that's EOF). The only real difference from your average garden hose scenario, is that pipes have a safety mechanism, if the sprayer nozel fails, or is removed (or the hose breaks) the "tap" automatically shuts off. All that is possible in the water the garden situation, but not usually economically sensible. Other similar setups have safeguards like that though. kre