Re: bash 5.1 heredoc pipes problematic, shopt needed

Alexey via Bug reports for the GNU Bourne Again SHell Thu, 28 Apr 2022 09:20:55 -0700

On 2022-04-26 01:05, Alexey via Bug reports for the GNU Bourne AgainSHell wrote:

On 2022-04-26 00:54, Chet Ramey wrote:

On 4/25/22 4:33 PM, Alexey wrote:

My key point that we have two choices for future:
  - make read from pipe faster, or


You mean the read builtin, right? I already explained those semantics.

  - provide options for force here-string to use temp files.


Yes, the absolute worst case scenario has a performance penalty. The
question is how that affects things that run in real life scenarios. I
think making this a shell compatiblity mode option is the best place
to start.

I don't see any other options for fast-enough performance.


Since you don't define `fast-enough', it's not really a question that
can be answered.


Sure, I'll try to provide you more real life scenario later rather
than just empty for loop.

But getting performance degradation comparing to bash4.4 (which always
use temp files for here-string) it's sad evolution.

p.s. I disagreed that I should choose other script languages (not bash
or other shells) for performance critical tasks if we are talking
about system interactions. Bash is great suitable for most admins
tasks.


Hello.

I promised you more examples, and here they are:
Very common case to build a list of files for further processing:
  declare -a FILES
  #1

FILES=(); time readarray -t FILES <<<"$(find "$d" -xdev -maxdepth 5-type f)"

  #2
  # <<< act as a tmp file (due to result bigger than 64K)

FILES=(); time while read -r f; do FILES+=("$f"); done <<<"$(find /-xdev -maxdepth 5 -type f)"

#3

FILES=(); time while read -r f; do FILES+=("$f"); done < <(find /-xdev -maxdepth 5 -type f)


From these examples we can see that:

- example #1 approximately 2 times faster than example #2, and 4 timesfaster than example #3.- to be more honest, first example should be appended with at leastempty loop: for f in "${FILES[@]}"; do :; done

    after such modification example #2 became comparable with example #1

Also there is a problem that we can't use `mapfile -t <<<"$()"' asequivalent to `mapfile -t < <()', becausehere-string appends a newline, so MAPFILE will have one empty elementinstead of no elements in case of empty subshell result.So it's one more situation where we have to use PIPE instead of tmpfile.


If we dig into `strace' we can see, that:

- example #1 do a sequential read() for 4096b (it's the most productiveway);- example #2 do read() for 4096b and then lseek() back if bash found`delimiter' in read buffer;- example #3 do read() only for 1b at once (it's the worst way to doso).Yes, I know that we can't do lseek() in PIPE and this is the mainreason for 1-byte read().bash can do 4096b read() to internal buffer related tofile-descriptor and have an emulated lseek()

   within that buffer.


CONCLUSION:
 - we shouldn't change tmp file to pipe if it slows down code execution;

- BUT it's a good attempt to go away from tmp files to pipes IF bashwill create internal buffer

   for reading to level the problem with 1-byte read() from pipe.

Bash could do 4096b read() to some internal buffer related tofile-descriptor and have an emulated lseek()

   within that buffer.

- we can create additional option for read/readarray built-in (forexample -b) to force described above read buffer.This options allow script-writer to decide how read() should be doneaccording to his knowledge of further usingof this PIPE (e.g. calling subshell or completely exec to a newprogram).



Regards,
Alexey.

Re: bash 5.1 heredoc pipes problematic, shopt needed

Reply via email to