Re: bash sockets: printf \x0a does TCP fragmentation
Date:Sat, 22 Sep 2018 23:51:08 -0600 From:Bob Proulx Message-ID: <20180922231240358868...@bob.proulx.com> | Using the same buffer size | for input and output is usually most efficient. Yes, but as the objective seemed to be to make big packets, that is probably not as important. | $ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd status=none obs=1M ; cat /tmp/out | one | two | ... | read(0, "one\ntwo\n", 512) = 8 What is relevant there is that you're getrting both lines from the printf in one read. If that had happened, there would ne no need for any rebuffering. The point of the original complaint was that that was not ahppening, and the reads were being broken at the \n ... here it might easily make a difference whether the output is a pipe or a socket (I have no idea.) | But even if ibs is much too small it still behaves okay with a small | input buffer size and a large output buffer size. Yes, with separate buffers, that's how dd works (has always worked). That is why using it that way could solve the problem. | It seems to me that using a large buffer size for both read and write | would be the most efficient. Yes. | It can then use the same buffer that data was read into for the output | buffer directly. No, it can't, that's what bs= does - you're right, that is most effecient, but there is no rebuffering, whatever is read, is written, and in that case even more effecient is not to interpose dd at all. The whole point was to get the rebuffering. Try tests more like { printf %s\\n aaa; sleep 1; printf %s\\n bbb ; } | dd so there will be clearly 2 different writes, and small reads for dd (however big the input buffer has) - with obs= (somethingbig enough) there will be just 1 write, with bs= (anything big enough for the whole output) there will still be two writes. kre ps: this is not really the correct place to discuss dd.
Re: bash sockets: printf \x0a does TCP fragmentation
On 9/22/18 6:49 AM, dirk+b...@testssl.sh wrote: > > > On 9/22/18 12:38 PM, Ilkka Virta wrote: >> On 22.9. 02:34, Chet Ramey wrote: >>> Newline? It's probably that stdout is line-buffered and the newline causes >>> a flush, which results in a write(2). >> >> Mostly out of curiosity, what kind of buffering logic does Bash (or the >> builtin >> printf in particular) use? It doesn't seem to be the usual stdio logic where >> you get >> line-buffering if printing to a terminal and block buffering otherwise. I >> get a >> distinct write per line even if the stdout of Bash itself is redirected to >> say >> /dev/null or a pipe: >> >> $ strace -etrace=write bash -c 'printf "foo\nbar\n"' > /dev/null >> write(1, "foo\n", 4) = 4 >> write(1, "bar\n", 4) = 4 >> +++ exited with 0 +++ > > Oh. But thanks anyway! > > coreutils in fact does it in one shot as you indicated. Then the change you need suggests itself: env printf ... or (exec printf ...) since the bash exec builtin doesn't execute builtin commands. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: bash sockets: printf \x0a does TCP fragmentation
On 9/22/18 4:22 PM, Bob Proulx wrote: > Note that I *did* provide you with a way to do what you wanted to do. :-) > > It was also noted in another message that the external standalone > printf command line utility did buffer as you desired. That seems > another very good solution too. Simply use "command printf ..." to > force using the external version. This won't work the way you want. The `command' builtin only inhibits execution of shell functions. It still executes builtins. You want to either get the full pathname of a printf utility using `type -ap printf' and use that, or use the env or exec variants I recommended in my last message. > > Anyway... Since printf is a text oriented utility it makes sense to > me that I would operate in line buffered output mode. It's that bash sets stdout and stderr to be line-buffered, not anything printf-specific. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: bash sockets: printf \x0a does TCP fragmentation
Robert Elz wrote: > Bob Proulx wrote: > | Using the same buffer size > | for input and output is usually most efficient. > > Yes, but as the objective seemed to be to make big packets, that is probably > not as important. The original complaint concerned flushing a data blob content upon every newline (0x0a) character due to line buffering, write(2)'ing the buffer up to that point. As I am sure you already know that will cause the network stack in the kernel to emit the buffered data up to that point with whatever has been read up to that point. Which was apparently a small'ish amount of data. And then instead of having some number of full MTU sized packets there were many more smaller ones. It shouldn't have been about big packets, nor fragmentation, but about streaming efficiency and performance. Though achieving correct behavior with more buffer flushes than desired this was apparently less efficient than they wanted and were therefore complaining about it. They wanted the data blob buffered as much as possible so as to use the fewest number of TCP network packets. My choice of a large one meg buffer size was to be larger than any network MTU size. My intention was that the network stack would then split the data blob up into MTU sizes for transmission. The largest MTU size that I routinely see is 64k. I expect that to increase further in size in the future when 1 meg might not be big enough. And I avoid mentioning jumbo frames. > | $ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd > status=none obs=1M ; cat /tmp/out > | one > | two > | ... > | read(0, "one\ntwo\n", 512) = 8 > > What is relevant there is that you're getrting both lines from the printf in > one read. If that had happened, there would ne no need for any rebuffering. > The point of the original complaint was that that was not ahppening, and > the reads were being broken at the \n ... here it might easily make a > difference whether the output is a pipe or a socket (I have no idea.) I dug into this further and see that we were both right. :-) I was getting misdirected by the Linux kernel's pipeline buffering. The pipeline buffering was causing me to think that it did not matter. But digging deeper I see that it was a race condition timing issue and could go either way. That's obviously a mistake on my part. You are right that depending upon timing this must be handled properly or it might fail. I am wrong that it would always work regardless of timing. However it was working in my test case which is why I had not noticed. Thank you for pushing me to see the problem here. > | It can then use the same buffer that data was read into for the output > | buffer directly. > > No, it can't, that's what bs= does - you're right, that is most effecient, > but there is no rebuffering, whatever is read, is written, and in that case > even more effecient is not to interpose dd at all. The whole point was > to get the rebuffering. > > Try tests more like > > { printf %s\\n aaa; sleep 1; printf %s\\n bbb ; } | dd > > so there will be clearly 2 different writes, and small reads for dd > (however big the input buffer has) - with obs= (somethingbig enough) > there will be just 1 write, with bs= (anything big enough for the whole > output) there will still be two writes. $ { command printf "one\n"; command printf "two\n" ;} | strace -v -o /tmp/dd.strace.out -e write,read dd status=none bs=1M ; head /tmp/*.strace.out one two ... read(0, "one\ntwo\n", 1048576) = 8 write(1, "one\ntwo\n", 8) = 8 read(0, "", 1048576)= 0 +++ exited with 0 +++ Above the data is definitely written in two different processes but due to Linux kernel buffering in the pipeline it is read in one read. The data is written into the pipeline so quickly, before the next stage of the pipeline could read it out, that by the time the read eventually happened it was able to read the multiple writes as one data block. This is what I had been seeing but you are right that it is a timing related success and could also be a timing related failure. $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace -v -o /tmp/dd.strace.out -e write,read dd status=none bs=1M ; head /tmp/*.strace.out one two ... read(0, "one\n", 1048576) = 4 write(1, "one\n", 4)= 4 read(0, "two\n", 1048576) = 4 write(1, "two\n", 4)= 4 read(0, "", 1048576)= 0 +++ exited with 0 +++ The above illustrates the point you were trying to make. Thank you for persevering in educating me as to the issue. :-) $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | { sleep 2; strace -v -o /tmp/dd.strace.out -e write,read dd status=none bs=1M ; head /tmp/*.strace.out ;} one two ... read(0, "one\ntwo\n", 1048576) = 8 write(1, "on
Re: bash sockets: printf \x0a does TCP fragmentation
Chet Ramey wrote: > Bob Proulx wrote: > > It was also noted in another message that the external standalone > > printf command line utility did buffer as you desired. That seems > > another very good solution too. Simply use "command printf ..." to > > force using the external version. > > This won't work the way you want. The `command' builtin only inhibits > execution of shell functions. It still executes builtins. You want to > either get the full pathname of a printf utility using `type -ap printf' > and use that, or use the env or exec variants I recommended in my last > message. Oh drat! Now I have had to learn *TWO* things today. :-) > > Anyway... Since printf is a text oriented utility it makes sense to > > me that I would operate in line buffered output mode. > > It's that bash sets stdout and stderr to be line-buffered, not anything > printf-specific. I still think 'printf' feels more like a plain text utility and not something one reaches for when working with binary data blobs. Bob
Re: bash sockets: printf \x0a does TCP fragmentation
Chet Ramey wrote: > It's that bash sets stdout and stderr to be line-buffered, not anything > printf-specific. Shouldn't bash set stdout buffering based upon the output descriptor being a tty or not the same as other libc stdio behavior? Bob
Re: assoc_expand_once issues
On 9/20/18 6:37 PM, Grisha Levit wrote: > I was testing out this new feature and I can't figure out how to handle > certain characters in subscripts when the option is on. Without the > option, > it is possible to escape the `$' in the subscript to achieve the desired > result but obviously that does not work if the option *is* on -- is there a > different workaround to use or is this a bug with assoc_expand_once? Well, I don't think it's a bug, per se. There are three cases here: 1. Expansion performed both during word expansion of the assignment statement and during evaluation of the subscript. 2. Expansion performed during word expansion of the assignment statement. 3. Expansion performed during evaluation of the subscript. Case 1 is what bash has always done. Case 2 is what assoc_expand_once provides. Case 3 is what you want, and what you're simulating by quoting the expansion in case 1. I think there might be a way to take advantage of the information provided by case 2 to do what you want with these (usually not allowed) subscripts. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: bash sockets: printf \x0a does TCP fragmentation
On 9/23/18 2:46 PM, Bob Proulx wrote: > Chet Ramey wrote: >> It's that bash sets stdout and stderr to be line-buffered, not anything >> printf-specific. > > Shouldn't bash set stdout buffering based upon the output descriptor > being a tty or not the same as other libc stdio behavior? It's been so long (25+ years) I forget why we did it. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/