Re: Examples of concurrent coproc usage?

2024-04-16 Thread Andreas Schwab
On Apr 16 2024, Carl Edquist wrote:

> Well, you _can_ shovel binary data too: (*)
>
>   while IFS= read -rd '' X; do printf '%s\0' "$X"; done
>
> and use that pattern to make a shell-only version of tee(1) (and I suppose
> paste(1)).  Binary data doesn't work if you're reading newline-terminated
> records, because you cannot store the NUL character in a shell
> variable. But you can delimit your records on NULs, and use printf to
> reproduce them.

Though that will likely add a spurious null at EOF.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



Re: Examples of concurrent coproc usage?

2024-04-16 Thread Zachary Santer
On Tue, Apr 16, 2024 at 3:56 AM Andreas Schwab  wrote:
>
> On Apr 16 2024, Carl Edquist wrote:
>
> > Well, you _can_ shovel binary data too: (*)
> >
> >   while IFS= read -rd '' X; do printf '%s\0' "$X"; done
> >
> > and use that pattern to make a shell-only version of tee(1) (and I suppose
> > paste(1)).  Binary data doesn't work if you're reading newline-terminated
> > records, because you cannot store the NUL character in a shell
> > variable. But you can delimit your records on NULs, and use printf to
> > reproduce them.
>
> Though that will likely add a spurious null at EOF.

Just wouldn't copy over whatever might have followed the final null
byte, if we're not talking about null-terminated data.

printf_format='%s\x00'
while
  IFS='' read -r -d '' X ||
{
  [[ -n ${X} ]] &&
{
  printf_format='%s'
  true
}
  #
}
  #
do
  printf -- "${printf_format}" "${X}"
done

Might've gotten lucky with all those .so files ending in a null byte
for whatever reason.

There's no way to force this to give you the equivalent of sized
buffers. 'read -N' obviously has the same problem of trying to store
the null character in a variable. So, if you're trying to run this on
a huge text file, you're going to end up trying to shove that entire
file into a variable.



Re: Examples of concurrent coproc usage?

2024-04-16 Thread Carl Edquist

On Tue, 16 Apr 2024, Andreas Schwab wrote:

But you can delimit your records on NULs, and use printf to reproduce 
them.


Though that will likely add a spurious null at EOF.



On Tue, 16 Apr 2024, Zachary Santer wrote:


Just wouldn't copy over whatever might have followed the final null
byte, if we're not talking about null-terminated data.


You guys are right.  Sorry for glossing over that detail.

Yes if the file does not end in a NUL byte, the last dangling record still 
needs to be printed. You can handle it either way with, for example:


while IFS= read -rd '' X; do printf '%s\0' "$X"; X=; done
[[ $X ]] && printf '%s' "$X"


Might've gotten lucky with all those .so files ending in a null byte for 
whatever reason.


Yes that is exactly what happened :)

Luckily, on linux anyway, .so files and ELF binaries always seem to end in 
a null byte.



There's no way to force this to give you the equivalent of sized 
buffers.  'read -N' obviously has the same problem of trying to store 
the null character in a variable. So, if you're trying to run this on a 
huge text file, you're going to end up trying to shove that entire file 
into a variable.


Right, that is another reason why it's really not a great solution.

Although you can limit the buffer size with, say, 'read -n 4096', and with 
a bit more handling[1] still get a perfect copy.  But that's not my point.


My point is, it's one thing to use it in an emergency, but I don't 
consider it a real usable replacement for cat/tee/paste in general use.


Shoveling data around should really be done by an appropriate external 
program.  So in my multi-coproc example, the shell is really crippled if 
the close-on-exec flags prevent external programs from accessing manual 
copies of other coproc fds.



Carl



[1] eg:

emergency_maxbuf_cat_monster () (
maxbuf=${1:-4096}
fmts=('%s' '%s\0')
while IFS= read -rd '' -n $maxbuf X; do
printf "${fmts[${#X} < maxbuf]}" "$X";
X=;
done
[[ ! $X ]] || printf '%s' "$X"
)




Re: Examples of concurrent coproc usage?

2024-04-16 Thread Chet Ramey

On 4/12/24 12:49 PM, Carl Edquist wrote:


Where with a coproc

 coproc X { potentially short lived command with output; }
 exec {xr}<&${X[0]} {xw}>&${X[1]}

there is technically the possibility that the coproc can finish and be 
reaped before the exec command gets a chance to run and duplicate the fds.


But, I also get what you said, that your design intent with coprocs was for 
them to be longer-lived, so immediate termination was not a concern.


The bigger concern was how to synchronize between the processes, but that's
something that the script writer has to do on their own.

Personally I like the idea of 'closing' a coproc explicitly, but if it's 
a bother to add options to the coproc keyword, then I would say just let 
the user be responsible for closing the fds.  Once the coproc has 
terminated _and_ the coproc's fds are closed, then the coproc can be 
deallocated.


This is not backwards compatible. coprocs may be a little-used feature, 
but you're adding a burden on the shell programmer that wasn't there 
previously.


Ok, so, I'm trying to imagine a case where this would cause any problems or 
extra work for such an existing user.  Maybe you can provide an example 
from your own uses?  (Where it would cause trouble or require adding code 
if the coproc deallocation were deferred until the fds are closed explicitly.)


My concern was always coproc fds leaking into other processes, especially
pipelines. If someone has a coproc now and is `messy' about cleaning it up,
I feel like there's the possibility of deadlock. But I don't know how
extensively they're used, or all the use cases, so I'm not sure how likely
it is. I've learned there are users who do things with shell features I
never imagined. (People wanting to use coprocs without the shell as the
arbiter, for instance. :-) )

My first thought is that in the general case, the user doesn't really need 
to worry much about closing the fds for a terminated coproc anyway, as they 
will all be closed implicitly when the shell exits (either an interactive 
session or a script).


Yes.



[This is a common model for using coprocs, by the way, where an auxiliary 
coprocess is left open for the lifetime of the shell session and never 
explicitly closed.  When the shell session exits, the fds are closed 
implicitly by the OS, and the coprocess sees EOF and exits on its own.]


That's one common model, yes. Another is that the shell process explicitly
sends a close or shutdown command to the coproc, so termination is
expected.

If a user expects the coproc variable to go away automatically, that user 
won't be accessing a still-open fd from that variable for anything.


I'm more concerned about a pipe with unread data that would potentially
cause problems. I suppose we just need more testing.


As for the forgotten-about half-closed pipe fds to the reaped coproc, I 
don't see how they could lead to deadlock, nor do I see how a shell 
programmer expecting the existing behavior would even attempt to access 
them at all, apart from programming error.


Probably not.



The only potential issue I can imagine is if a script (or a user at an 
interactive prompt) would start _so_ many of these longer-lived coprocs 
(more than 500??), one at a time in succession, in a single shell session, 
that all the available fds would be exhausted.  (That is, if the shell is 
not closing them automatically upon coproc termination.)  Is that the 
backwards compatibility concern?


That's more of a "my arm hurts when I do this" situation. If a script
opened 500 fds using exec redirection, resource exhaustion would be their
own responsibility.


Meanwhile, the bash man page does not specify the shell's behavior for when 
a coproc terminates, so you might say there's room for interpretation and 
the new deferring behavior would not break any promises.


I could always enable it in the devel branch and see what happens with the
folks who use that. It would be three years after any release when distros
would put it into production anyway.



And as it strikes me anyway, the real "burden" on the programmer with the 
existing behavior is having to make a copy of the coproc fds every time


 coproc X { cmd; }
 exec {xr}<&${X[0]} {xw}>&${X[1]}

and use the copies instead of the originals in order to reliably read the 
final output from the coproc.


Maybe, though it's easy enough to wrap that in a shell function.


First, just to be clear, the fds to/from the coproc pipes are not 
invalid when the coproc terminates (you can still read from them); they 
are only invalid after they are closed.


That's only sort of true; writing to a pipe for which there is no reader 
generates SIGPIPE, which is a fatal signal.


Eh, when I talk about an fd being "invalid" here I mean "fd is not a valid 
file descriptor" (to use the language for EBADF from the man page for 
various system calls like read(2), write(2), close(2)).  That's why I say 
the fds only become invalid after they