On 2022-04-26 01:05, Alexey via Bug reports for the GNU Bourne Again
SHell wrote:
On 2022-04-26 00:54, Chet Ramey wrote:
On 4/25/22 4:33 PM, Alexey wrote:
My key point that we have two choices for future:
- make read from pipe faster, or
You mean the read builtin, right? I already explained those semantics.
- provide options for force here-string to use temp files.
Yes, the absolute worst case scenario has a performance penalty. The
question is how that affects things that run in real life scenarios. I
think making this a shell compatiblity mode option is the best place
to start.
I don't see any other options for fast-enough performance.
Since you don't define `fast-enough', it's not really a question that
can be answered.
Sure, I'll try to provide you more real life scenario later rather
than just empty for loop.
But getting performance degradation comparing to bash4.4 (which always
use temp files for here-string) it's sad evolution.
p.s. I disagreed that I should choose other script languages (not bash
or other shells) for performance critical tasks if we are talking
about system interactions. Bash is great suitable for most admins
tasks.
Hello.
I promised you more examples, and here they are:
Very common case to build a list of files for further processing:
declare -a FILES
#1
FILES=(); time readarray -t FILES <<<"$(find "$d" -xdev -maxdepth 5
-type f)"
#2
# <<< act as a tmp file (due to result bigger than 64K)
FILES=(); time while read -r f; do FILES+=("$f"); done <<<"$(find /
-xdev -maxdepth 5 -type f)"
#3
FILES=(); time while read -r f; do FILES+=("$f"); done < <(find /
-xdev -maxdepth 5 -type f)
From these examples we can see that:
- example #1 approximately 2 times faster than example #2, and 4 times
faster than example #3.
- to be more honest, first example should be appended with at least
empty loop: for f in "${FILES[@]}"; do :; done
after such modification example #2 became comparable with example #1
Also there is a problem that we can't use `mapfile -t <<<"$()"' as
equivalent to `mapfile -t < <()', because
here-string appends a newline, so MAPFILE will have one empty element
instead of no elements in case of empty subshell result.
So it's one more situation where we have to use PIPE instead of tmp
file.
If we dig into `strace' we can see, that:
- example #1 do a sequential read() for 4096b (it's the most productive
way);
- example #2 do read() for 4096b and then lseek() back if bash found
`delimiter' in read buffer;
- example #3 do read() only for 1b at once (it's the worst way to do
so).
Yes, I know that we can't do lseek() in PIPE and this is the main
reason for 1-byte read().
bash can do 4096b read() to internal buffer related to
file-descriptor and have an emulated lseek()
within that buffer.
CONCLUSION:
- we shouldn't change tmp file to pipe if it slows down code execution;
- BUT it's a good attempt to go away from tmp files to pipes IF bash
will create internal buffer
for reading to level the problem with 1-byte read() from pipe.
Bash could do 4096b read() to some internal buffer related to
file-descriptor and have an emulated lseek()
within that buffer.
- we can create additional option for read/readarray built-in (for
example -b) to force described above read buffer.
This options allow script-writer to decide how read() should be done
according to his knowledge of further using
of this PIPE (e.g. calling subshell or completely exec to a new
program).
Regards,
Alexey.