Re: [PATCH] use unlocked stdio functions

2024-02-14 Thread Vito Caputo
On Wed, Feb 14, 2024 at 10:59:57AM -0500, Chet Ramey wrote:
> On 2/5/24 10:47 PM, Grisha Levit wrote:
> > Bash makes many calls to stdio functions that may have unlocked_stdio(3)
> > equivalents. Since the locking functionality provided by the regular
> > versions is only useful in multi-threaded applications, it probably makes
> > sense for Bash to use the *_unlocked versions where available.
> 
> Thanks for the patch; this looks like a great idea.
> 

I thought this was only necessary for C programs built with pthreads
linked in / -D_REENTRANT.  Is that no longer the case?  Or has bash
started making use of pthreads?

When I first learned pthreads ages ago there was a substantial
performance hit to the classical stdio-using programs when you built
them w/pthreads.  It was an important detail to be aware of at the time
because so many programs of the era had been written assuming things
like getc/ungetc and other character-granular stdio functions were fast
functions if not macros.  But you didn't incur this hit if you didn't
make use of pthreads, which seemed like a conscious choice of the
pthreads creators to not impact all such existing software just because
a platform added pthreads support.

So unless my understanding is wrong/stale or bash has started using
pthreads, I don't think this should be necessary.  But things do seem to
have evolved here; for instance we no longer explicitly add -D_REENTRANT
with gcc, instead using -pthread now.  Would appreciate any input on the
current state of things in this area...

Thanks,
Vito Caputo



O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Vito Caputo
Hello list,

Is there any chance we could get a | modifier for enabling O_DIRECT on the
created pipe?  "Packet" style pipes have some interesting and potentially
useful properties, it would be nice if bash made them more accessible.

pipe(2) O_DIRECT excerpt:

O_DIRECT (since Linux 3.4)
   Create a pipe that performs I/O in "packet" mode.  Each write(2)
   to  the  pipe  is  dealt with as a separate packet, and read(2)s
   from the pipe will read one packet at a time.  Note the  follow‐
   ing points:
   
   *  Writes  of  greater than PIPE_BUF bytes (see pipe(7)) will be
  split  into  multiple  packets.   The  constant  PIPE_BUF  is
  defined in .
   
   *  If a read(2) specifies a buffer size that is smaller than the
  next packet, then the requested number of bytes are read, and
  the  excess  bytes in the packet are discarded.  Specifying a
  buffer size of  PIPE_BUF  will  be  sufficient  to  read  the
  largest possible packets (see the previous point).
   
   *  Zero-length packets are not supported.  (A read(2) that spec‐
  ifies a buffer size of zero is a no-op, and returns 0.)
   
   Older kernels that do not support this flag will  indicate  this
   via an EINVAL error.

Regards,
Vito Caputo



Re: O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Vito Caputo
On Wed, Sep 23, 2020 at 09:12:40AM -0400, Chet Ramey wrote:
> On 9/22/20 11:23 PM, Vito Caputo wrote:
> > Hello list,
> > 
> > Is there any chance we could get a | modifier for enabling O_DIRECT on the
> > created pipe?  "Packet" style pipes have some interesting and potentially
> > useful properties, it would be nice if bash made them more accessible.
> 
> Is there a general need, especially since they're Linux-specific?
>

I'm not sure, but as far as GNU/Linux distros go bash, is kind of the
canonical shell, and this functionality is kind of inaccessible
without the shell wiring it up.

If I'm not mistaken this pipe flavor exists as the default behavior in
plan9, so it's not entirely unique to linux conceptually.

> What kind of modifier would you suggest?
> 

Maybe triple pipe could be packetized pipe?  It visually expresses
being sliced up somewhat; `foo ||| bar`

> Does anyone want to take a shot at implementing this idea?
> 

It's possible I could find time to take a stab at it, if nobody else
wants to.

Cheers,
Vito Caputo



Re: O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Vito Caputo
On Wed, Sep 23, 2020 at 11:53:10PM -0400, Lawrence Velázquez wrote:
> > On Sep 23, 2020, at 11:41 PM, Vito Caputo  wrote:
> > 
> > On Wed, Sep 23, 2020 at 09:12:40AM -0400, Chet Ramey wrote:
> >> On 9/22/20 11:23 PM, Vito Caputo wrote:
> >>> Hello list,
> >>> 
> >>> Is there any chance we could get a | modifier for enabling O_DIRECT on the
> >>> created pipe?  "Packet" style pipes have some interesting and potentially
> >>> useful properties, it would be nice if bash made them more accessible.
> >> 
> >> Is there a general need, especially since they're Linux-specific?
> >> 
> > 
> > I'm not sure, but as far as GNU/Linux distros go bash, is kind of the
> > canonical shell, and this functionality is kind of inaccessible
> > without the shell wiring it up.
> 
> What functionality? I (and I'm sure some others) am not familiar
> with packet-style pipes and their benefits. You haven't actually
> described *how* exposing them would be useful, and why that would
> justify introducing new syntax that only matters/works on Linux.
> 

Packetized pipes establish well-defined boundaries between writes
reproduced at the read side.  If the write sizes are kept within
PIPE_BUF bounds, then you can be certain what's read is an atomic
record including nothing from a subsequent or previous write, with no
possibility for partial records.

It's useful if you're doing something like say, aggregating data from
multiple piped sources into a single bytestream.  With the default
pipe behavior, you'd have the output interleaved at random boundaries.
With packetized pipes, if your sources write say, newline-delimited
text records, kept under PIPE_BUF length, the aggregated output would
always interleave between the lines, never in the middle of them.

If we added this to the shell, I suppose the next thing to explore
would be how to get all the existing core shell utilities to detect a
packetized pipe on stdout and switch to a line-buffered mode instead
of block-buffered, assuming they're using stdio.  That should turn
their lines into packets on the pipe, and it all becomes generally
relevant across the existing shell utils landscape.  This heuristic
echoes of the terminal output detection for stdout line-buffering
already performed according to setvbuf(3).

Thanks,
Vito Caputo



Re: O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Vito Caputo
On Thu, Sep 24, 2020 at 12:48:14PM +0700, Robert Elz wrote:
> Date:Wed, 23 Sep 2020 21:47:10 -0700
> From:    Vito Caputo 
> Message-ID:  <20200924044710.xpltp22bpxoxi...@shells.gnugeneration.com>
> 
> 
>   | It's useful if you're doing something like say, aggregating data from
>   | multiple piped sources into a single bytestream.  With the default
>   | pipe behavior, you'd have the output interleaved at random boundaries.
> 
> If that's happening, then either the pipe implementation is badly broken,
> or the applications using it aren't doing what you'd like them to do.
> 
> Writes (<= the pipe buffer size) have always (since ancient unix, probably
> since pipes were first created) been atomic - nothing will randomly split
> the data.
> 
> What the new option is offering (as best I can tell from the discussion
> here, I am not a linux user) is passing those boundaries through the pipe
> to the reader - that hasn't been a pipe feature, but it is exactly what a
> unix domain datagram socket provides (these days pipes are sometimes
> implemented using unix domain connection oriented sockets ... I'm guessing
> that the option simply changes the transport protocol used with an
> implementation that works that way).
> 

Apparently I was incomplete in describing my conjured example.

The aggregator in this case is a process connected to multiple pipes,
not a pipe with multiple writer processes.

What you describe is correct WRT multiple writers to a shared pipe.

In my example, the aggregator can trivially read the separate records
at the write boundaries from each of the connected packetized pipes.
The reads return at the write boundaries.  Without packetized pipes
you'd need to parse the contents to search for record boundaries.

Imagine it's like an inverted `tee` for input instead of output.
Without packetized pipes, this hypethetical program couldn't
interleave the collected inputs at record boundaries without parsing
the contents.  Presumably this is *why* we don't already have an input
version of `tee`.  I'd like to work towards changing that.


>   | With packetized pipes, if your sources write say, newline-delimited
>   | text records, kept under PIPE_BUF length, the aggregated output would
>   | always interleave between the lines, never in the middle of them.
> 
> That happens with regular pipes.
> 

See above, the aggregator is a process, not a shared pipe.


>   | If we added this to the shell, I suppose the next thing to explore
>   | would be how to get all the existing core shell utilities to detect a
>   | packetized pipe on stdout and switch to a line-buffered mode instead
>   | of block-buffered, assuming they're using stdio.
> 
> I suspect that is really all you need - a mechanism to request line
> buffered output rather than blocksize buffered.   You don't need to
> go fiddling with pipes for that, and abusing the pipe interface as a
> way to pass a "line buffer this output please" request to the application
> seems like the wrong way to achieve that to me.
> 

This is probably true, though if a packetized pipe were introspectable
we could request the behavior via the ||| construction, while
simultaneously enabling record boundaries regardless of how the
contents are delimited.  If consumers knew about packetized pipes,
they could treat the separately returned reads as records
independently of what's inside.

> This isn't a criticism of the datagram packet pipe idea - there are
> applications for that (pipe is easier to use than manually setting up
> a pair of unix domain datagram sockets) but that is for specialised
> applications, where for whatever reason the receiver needs to read just
> one packet at a time (usually because of a desire to have multiple
> reading applications, each taking the next request, and then processing
> it ... if there is just one receiving process all that is needed is
> to stick a record length before each packet sent to a normal pipe, and
> let the receiver process the records from the aggregations it receives).
> 

Thanks for the thoughtful response,
Vito Caputo