Re: Examples of concurrent coproc usage?

2024-04-20 Thread Carl Edquist

On Wed, 17 Apr 2024, Chet Ramey wrote:


On 4/16/24 2:46 AM, Carl Edquist wrote:

But the shell is pretty slow when you ask it to shovel data around like 
this.  The 'read' builtin, for instance, cautiously does read(2) calls 
of a single byte at a time.


It has to do it that way to find the delimiter on a non-seekable file 
descriptor, since it has to leave everything it didn't consume available 
on stdin.


Understood, and I wouldn't have it any other way.  It's entirely 
appropriate for reading relatively small amounts of data into shell 
variables.  I'm just saying for copying or filtering a substantial amount 
of data, it's 1000x better to use a suitable external command instead.


It makes me cringe a bit and sigh when I see people put something like

while read X; do echo "$X"; done

in a script, because they somehow imagine it to be more efficient than 
simply running "cat" .


Carl




Re: Examples of concurrent coproc usage?

2024-04-20 Thread Carl Edquist

On Wed, 17 Apr 2024, Chet Ramey wrote:


On 4/15/24 1:01 PM, Carl Edquist wrote:


 Yet another point brought to light by the bcalc example relates to the
 coproc pid variable.  The reset() function first closes the coproc
 pipe fds, then sleeps for a second to give the BC coproc some time to
 finish.

 An alternative might be to 'wait' for the coproc to finish (likely
 faster than sleeping for a second).


If the coproc has some problem and doesn't exit immediately, `wait' 
without options will hang. That's why I opted for the 
sleep/kill-as-insurance combo.


Yes that much was clear from the script itself.

I didn't mean any of that as a critique of the bcalc script.  I just meant 
it brought to light the point that the coproc pid variable is another 
thing in the current deallocate-on-terminate behavior, that needs to be 
copied before it can be used reliably.  (With the 'kill' or 'wait' 
builtins.)


Though I do suspect that the most common case with coprocs is that closing 
the shell's read and write fds to the coproc is enough to cause the coproc 
to finish promptly - as neither read attempts on its stdin nor write 
attempts on its stdout can block anymore.


I think this is _definitely_ true for the BC coproc in the bcalc example. 
But it's kind of a distraction to get hung up on that detail, because in 
the general case there may very well be other scenarios where it would be 
appropriate to, um, _nudge_ the coproc a bit with the kill command.



(And before you ask why I didn't use `wait -n', I wrote bcalc in 30 
minutes after someone asked me a question about doing floating point 
math with awk in a shell script, and it worked.)


It's fine!  It's just an example, after all  :)


Carl



Re: Examples of concurrent coproc usage?

2024-04-20 Thread Carl Edquist

On Wed, 17 Apr 2024, Chet Ramey wrote:

Yes, I agree that coprocs should survive being suspended. The most 
recent devel branch push has code to prevent the coproc being reaped if 
it's stopped and not terminated.


Oh, nice!  :)


Carl



Re: Examples of concurrent coproc usage?

2024-04-20 Thread Carl Edquist

On Thu, 18 Apr 2024, Martin D Kealey wrote:


On Wed, 17 Apr 2024, Chet Ramey wrote:

It has to do it that way to find the delimiter on a non-seekable file 
descriptor, since it has to leave everything it didn't consume 
available on stdin.


Has anyone tried asking any of the kernel teams (Linux, BSD, or other) 
to add a new system call such as readln() or readd()?


You mean, specifically in order to implement a slightly-more-efficient 
'read' builtin in the shell?



I envisage this working like stty cooked mode works on a tty, except it 
would also work on files, pipes, and sockets: you'd get back *at most* 
as many bytes as you ask for, but you may get fewer if a delimiter is 
found. The delimiter is consumed (and returned in the buffer), but 
everything following a delimiter is left available for a subsequent 
read.


One downside is you'd end up with a system call for each token, which is 
only a little bit better than the 'read' builtin read(2)'ing 1 byte at a 
time.  If your program / shell script is going to be processing a long 
stream of tokens, it's just going to be more efficient to read(2) a block 
at a time and do the tokenizing in userspace.  And with any luck you can 
find an appropriate command line utility to do that for you, rather than 
relying on the shell's 'read' builtin.


(Or for your own programs, use getline(3)/getdelim(3), as Chet mentioned.)

Carl



Re: Examples of concurrent coproc usage?

2024-04-20 Thread Carl Edquist

On Tue, 16 Apr 2024, Chet Ramey wrote:

The bigger concern was how to synchronize between the processes, but 
that's something that the script writer has to do on their own.


Right.  It can be tricky and depends entirely on what the user's up to.


My concern was always coproc fds leaking into other processes, 
especially pipelines. If someone has a coproc now and is `messy' about 
cleaning it up, I feel like there's the possibility of deadlock.


I get where you're coming from with the concern.  I would welcome being 
shown otherwise, but as far as I can tell, deadlock is a ghost of a 
concern once the coproc is dead.



Maybe it helps to step through it ...


- First, where does deadlock start?  (In the context of pipes)

I think the answer is: When there is a read or write attempted on a pipe 
that blocks (indefinitely).



- What causes a read or a write on a pipe to block?

A pipe read blocks when a corresponding write-end is open, 
but there is no data available to read.


A pipe write blocks when a corresponding read-end is open, 
but the pipe is full.



- Are the coproc's corresponding ends of the shell's pipe fds open?

Well, not if the coproc is really dead.


- Will a read or write ever be attempted?

If the shell's stray coproc fds are left open, sure they will leak into 
pipelines too - but since they're forgotten, in theory no command will 
actually attempt to use them.



- What if a command attempts to use these stray fds anyway, by mistake?

If the coproc is really dead, then its side of the pipe fds will have been 
closed.  Thus read/write attempts on the fds on the shell's side (either 
from the shell itself, or from commands / pipelines that the fds leaked 
into) WILL NOT BLOCK, and thus will not result in deadlock.


(A read attempt will hit EOF, a write attempt will get SIGPIPE/EPIPE.)


HOPEFULLY that is enough to put any reasonable fears of deadlock to bed - 
at least in terms of the shell's leaked fds leading to deadlock.



- But what if the _coproc_ leaked its pipe fds before it died?

At this point I think perhaps we get into what you called a "my arm hurts 
when I do this" situation.  It kind of breaks the whole coproc model: if 
the stdin/stdout of a coproc are still open by one of the coproc's 
children, then I might say the coproc is not really dead.


But anyway I want to be a good sport, for completeness.


An existing use case that would lead to trouble would perhaps have to look 
something like this:


The shell sends a quit command to a coproc, without closing the shell's 
coproc fds.


The coproc has a child, then exits.  The coproc (parent) is dead.  The 
coproc's child has inherited the coproc's pipe fds.  The script author 
_expects_ that the coproc parent will exit, and expects that this will 
trigger the old behavior, that the shell will automatically close its fds 
to the coproc parent.  Thus the author _expects_ that the coproc exiting 
will, indirectly but automatically, cause any blocked reads/writes on 
stdin/stdout in the coproc's child to stop blocking.  Thus the author 
_expects_ the coproc's child to promptly complete, even though its output 
_will not be consumable_ (because the author _expects_ that its stdout 
will be attached to a broken pipe).


But [here's where the potential problem starts] with the new deferring 
behavior, the shell's coproc fds are not automatically closed, and thus 
the coproc's _child_ does not stop blocking, and thus the author's 
short-lived expectations for this coproc's useless child are dashed to the 
ground, while that child is left standing idle until the cows come home. 
(That is, until the shell exits.)


It really seems like a contrived and senseless scenario, doesn't it? 
(Even to me!)


[And an even more far-fetched scenario: a coproc transmits copies of its 
pipe fds to another process over a unix socket ancillary message 
(SCM_RIGHTS), instead of to a child by inheritance.  The rest of the story 
is the same, and equally senseless.]



But I don't know how extensively they're used, or all the use cases, so 
I'm not sure how likely it is. I've learned there are users who do 
things with shell features I never imagined. (People wanting to use 
coprocs without the shell as the arbiter, for instance. :-) )


Hehe...

Well, yeah, once you gift-wrap yourself a friendly, reliable interface and 
have the freedom to play with it to your heart's content - you find some 
fun things to do with coprocesses.  (Much like regular shell pipelines.)


I get your meaning though - without knowing all the potential uses, it's 
hard to say with absolute certainty that no user will be negatively 
affected by a new improvement or bug fix.



[This is a common model for using coprocs, by the way, where an 
auxiliary coprocess is left open for the lifetime of the shell session 
and never explicitly closed.  When the shell session exits, the fds are 
closed implicitly by the OS, and the coprocess sees EOF and exits on 
its own.]


That's