date:20180923

Re: bash sockets: printf \x0a does TCP fragmentation

2018-09-23 Thread Robert Elz

Date:Sat, 22 Sep 2018 23:51:08 -0600
From:Bob Proulx 
Message-ID:  <20180922231240358868...@bob.proulx.com>

  | Using the same buffer size
  | for input and output is usually most efficient.

Yes, but as the objective seemed to be to make big packets, that is probably
not as important.

  |   $ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd 
status=none obs=1M ; cat /tmp/out
  |   one
  |   two
  |   ...
  |   read(0, "one\ntwo\n", 512)  = 8

What is relevant there is that you're getrting both lines from the printf in 
one read.  If that had happened, there would ne no need for any rebuffering.
The point of the original complaint was that  that was not ahppening, and
the reads were being broken at the \n ... here it might easily make a 
difference whether the output is a pipe or a socket (I have no idea.)

  | But even if ibs is much too small it still behaves okay with a small
  | input buffer size and a large output buffer size.

Yes, with separate buffers, that's how dd works (has always worked).
That is why using it that way could solve the problem.

  | It seems to me that using a large buffer size for both read and write
  | would be the most efficient.

Yes.

  | It can then use the same buffer that data was read into for the output
  | buffer directly.

No, it can't, that's what bs= does - you're right, that is most effecient,
but there is no rebuffering, whatever is read, is written, and in that case
even more effecient is not to interpose dd at all.  The whole point was
to get the rebuffering.

Try tests more like

{ printf %s\\n aaa; sleep 1; printf %s\\n bbb ; } | dd 

so there will be clearly 2 different writes, and small reads for dd
(however big the input buffer has) - with obs= (somethingbig enough)
there will be just 1 write, with bs= (anything big enough for the whole
output) there will still be two writes.

kre

ps: this is not really the correct place to discuss dd.

Re: bash sockets: printf \x0a does TCP fragmentation

2018-09-23 Thread Chet Ramey

On 9/22/18 6:49 AM, dirk+b...@testssl.sh wrote:
> 
> 
> On 9/22/18 12:38 PM, Ilkka Virta wrote:
>> On 22.9. 02:34, Chet Ramey wrote:
>>> Newline? It's probably that stdout is line-buffered and the newline causes
>>> a flush, which results in a write(2).
>>
>> Mostly out of curiosity, what kind of buffering logic does Bash (or the 
>> builtin
>> printf in particular) use? It doesn't seem to be the usual stdio logic where 
>> you get
>> line-buffering if printing to a terminal and block buffering otherwise. I 
>> get a
>> distinct write per line even if the stdout of Bash itself is redirected to 
>> say
>> /dev/null or a pipe:
>>
>>  $ strace -etrace=write bash -c 'printf "foo\nbar\n"' > /dev/null
>>  write(1, "foo\n", 4)    = 4
>>  write(1, "bar\n", 4)    = 4
>>  +++ exited with 0 +++
> 
> Oh. But thanks anyway!
> 
> coreutils in fact does it in one shot as you indicated.

Then the change you need suggests itself:

env printf ...

or

(exec printf ...)

since the bash exec builtin doesn't execute builtin commands.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

Re: bash sockets: printf \x0a does TCP fragmentation

2018-09-23 Thread Chet Ramey

On 9/22/18 4:22 PM, Bob Proulx wrote:

> Note that I *did* provide you with a way to do what you wanted to do. :-)
> 
> It was also noted in another message that the external standalone
> printf command line utility did buffer as you desired.  That seems
> another very good solution too.  Simply use "command printf ..." to
> force using the external version.

This won't work the way you want. The `command' builtin only inhibits
execution of shell functions. It still executes builtins.  You want to
either get the full pathname of a printf utility using `type -ap printf'
and use that, or use the env or exec variants I recommended in my last
message.

> 
> Anyway...  Since printf is a text oriented utility it makes sense to
> me that I would operate in line buffered output mode.

It's that bash sets stdout and stderr to be line-buffered, not anything
printf-specific.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

Re: bash sockets: printf \x0a does TCP fragmentation

2018-09-23 Thread Bob Proulx

Robert Elz wrote:
> Bob Proulx wrote:
>   | Using the same buffer size
>   | for input and output is usually most efficient.
> 
> Yes, but as the objective seemed to be to make big packets, that is probably
> not as important.

The original complaint concerned flushing a data blob content upon
every newline (0x0a) character due to line buffering, write(2)'ing the
buffer up to that point.  As I am sure you already know that will
cause the network stack in the kernel to emit the buffered data up to
that point with whatever has been read up to that point.  Which was
apparently a small'ish amount of data.  And then instead of having
some number of full MTU sized packets there were many more smaller
ones.  It shouldn't have been about big packets, nor fragmentation,
but about streaming efficiency and performance.  Though achieving
correct behavior with more buffer flushes than desired this was
apparently less efficient than they wanted and were therefore
complaining about it.  They wanted the data blob buffered as much as
possible so as to use the fewest number of TCP network packets.  My
choice of a large one meg buffer size was to be larger than any
network MTU size.  My intention was that the network stack would then
split the data blob up into MTU sizes for transmission.  The largest
MTU size that I routinely see is 64k.  I expect that to increase
further in size in the future when 1 meg might not be big enough.  And
I avoid mentioning jumbo frames.

>   |   $ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd 
> status=none obs=1M ; cat /tmp/out
>   |   one
>   |   two
>   |   ...
>   |   read(0, "one\ntwo\n", 512)  = 8
> 
> What is relevant there is that you're getrting both lines from the printf in 
> one read.  If that had happened, there would ne no need for any rebuffering.
> The point of the original complaint was that  that was not ahppening, and
> the reads were being broken at the \n ... here it might easily make a 
> difference whether the output is a pipe or a socket (I have no idea.)

I dug into this further and see that we were both right. :-)

I was getting misdirected by the Linux kernel's pipeline buffering.
The pipeline buffering was causing me to think that it did not matter.
But digging deeper I see that it was a race condition timing issue and
could go either way.  That's obviously a mistake on my part.

You are right that depending upon timing this must be handled properly
or it might fail.  I am wrong that it would always work regardless of
timing.  However it was working in my test case which is why I had not
noticed.  Thank you for pushing me to see the problem here.

>   | It can then use the same buffer that data was read into for the output
>   | buffer directly.
> 
> No, it can't, that's what bs= does - you're right, that is most effecient,
> but there is no rebuffering, whatever is read, is written, and in that case
> even more effecient is not to interpose dd at all.  The whole point was
> to get the rebuffering.
> 
> Try tests more like
> 
>   { printf %s\\n aaa; sleep 1; printf %s\\n bbb ; } | dd 
> 
> so there will be clearly 2 different writes, and small reads for dd
> (however big the input buffer has) - with obs= (somethingbig enough)
> there will be just 1 write, with bs= (anything big enough for the whole
> output) there will still be two writes.

  $ { command printf "one\n"; command printf "two\n" ;} | strace -v -o 
/tmp/dd.strace.out -e write,read dd status=none bs=1M ; head /tmp/*.strace.out
  one
  two
  ...
  read(0, "one\ntwo\n", 1048576)  = 8
  write(1, "one\ntwo\n", 8)   = 8
  read(0, "", 1048576)= 0
  +++ exited with 0 +++

Above the data is definitely written in two different processes but
due to Linux kernel buffering in the pipeline it is read in one read.
The data is written into the pipeline so quickly, before the next
stage of the pipeline could read it out, that by the time the read
eventually happened it was able to read the multiple writes as one
data block.  This is what I had been seeing but you are right that it
is a timing related success and could also be a timing related
failure.

  $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace -v -o 
/tmp/dd.strace.out -e write,read dd status=none bs=1M ; head /tmp/*.strace.out
  one
  two
  ...
  read(0, "one\n", 1048576)   = 4
  write(1, "one\n", 4)= 4
  read(0, "two\n", 1048576)   = 4
  write(1, "two\n", 4)= 4
  read(0, "", 1048576)= 0
  +++ exited with 0 +++

The above illustrates the point you were trying to make.  Thank you
for persevering in educating me as to the issue. :-)

  $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | { sleep 2; 
strace -v -o /tmp/dd.strace.out -e write,read dd status=none bs=1M ; head 
/tmp/*.strace.out ;}
  one
  two
  ...
  read(0, "one\ntwo\n", 1048576)  = 8
  write(1, "on

Re: bash sockets: printf \x0a does TCP fragmentation

2018-09-23 Thread Bob Proulx

Chet Ramey wrote:
> Bob Proulx wrote:
> > It was also noted in another message that the external standalone
> > printf command line utility did buffer as you desired.  That seems
> > another very good solution too.  Simply use "command printf ..." to
> > force using the external version.
> 
> This won't work the way you want. The `command' builtin only inhibits
> execution of shell functions. It still executes builtins.  You want to
> either get the full pathname of a printf utility using `type -ap printf'
> and use that, or use the env or exec variants I recommended in my last
> message.

Oh drat!  Now I have had to learn *TWO* things today.  :-)

> > Anyway...  Since printf is a text oriented utility it makes sense to
> > me that I would operate in line buffered output mode.
> 
> It's that bash sets stdout and stderr to be line-buffered, not anything
> printf-specific.

I still think 'printf' feels more like a plain text utility and not
something one reaches for when working with binary data blobs.

Bob

Re: bash sockets: printf \x0a does TCP fragmentation

2018-09-23 Thread Bob Proulx

Chet Ramey wrote:
> It's that bash sets stdout and stderr to be line-buffered, not anything
> printf-specific.

Shouldn't bash set stdout buffering based upon the output descriptor
being a tty or not the same as other libc stdio behavior?

Bob

Re: assoc_expand_once issues

2018-09-23 Thread Chet Ramey

On 9/20/18 6:37 PM, Grisha Levit wrote:
> I was testing out this new feature and I can't figure out how to handle
> certain characters in subscripts when the option is on.  Without the
> option,
> it is possible to escape the `$' in the subscript to achieve the desired
> result but obviously that does not work if the option *is* on -- is there a
> different workaround to use or is this a bug with assoc_expand_once?

Well, I don't think it's a bug, per se. There are three cases here:

1. Expansion performed both during word expansion of the assignment
   statement and during evaluation of the subscript.

2. Expansion performed during word expansion of the assignment statement.

3. Expansion performed during evaluation of the subscript.

Case 1 is what bash has always done. Case 2 is what assoc_expand_once
provides. Case 3 is what you want, and what you're simulating by quoting
the expansion in case 1.

I think there might be a way to take advantage of the information provided
by case 2 to do what you want with these (usually not allowed) subscripts.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

Re: bash sockets: printf \x0a does TCP fragmentation

2018-09-23 Thread Chet Ramey

On 9/23/18 2:46 PM, Bob Proulx wrote:
> Chet Ramey wrote:
>> It's that bash sets stdout and stderr to be line-buffered, not anything
>> printf-specific.
> 
> Shouldn't bash set stdout buffering based upon the output descriptor
> being a tty or not the same as other libc stdio behavior?

It's been so long (25+ years) I forget why we did it.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

Re: bash sockets: printf \x0a does TCP fragmentation

Re: bash sockets: printf \x0a does TCP fragmentation

Re: bash sockets: printf \x0a does TCP fragmentation

Re: bash sockets: printf \x0a does TCP fragmentation

Re: bash sockets: printf \x0a does TCP fragmentation

Re: bash sockets: printf \x0a does TCP fragmentation

Re: assoc_expand_once issues

Re: bash sockets: printf \x0a does TCP fragmentation

8 matches

Site Navigation

Mail list logo

Footer information