'while read' loop performance (redirection vs pipeline)

2020-06-10 Thread Terence O'Gorman
Configuration Information:

Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -O2 -g -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
-fstack-protector-strong -grecord-gcc-switches
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
-Wno-parentheses -Wno-format-security
uname output: Linux quietpc 5.6.13-200.fc31.x86_64 #1 SMP
  Thu May 14 23:26:14 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-redhat-linux-gnu

Bash Version: 5.0
Patch Level: 11
Release Status: release


Description:

The 'while read' loop exhibits very different performance depending on
whether input is via redirection or pipeline (and increasingly diverging
performance with any increased data).  Here are some results from my
machine (also observed in versions 3.0 and 4.0):

[tog@quietpc ~]$ ~/loop.bash

100,000 lines of length 4:
'<': real = 0.609, user = 0.522, sys = 0.087
'|': real = 1.000, user = 0.561, sys = 0.793

100,000 lines of length 8:
'<': real = 0.652, user = 0.567, sys = 0.084
'|': real = 1.788, user = 0.772, sys = 1.838

100,000 lines of length 16:
'<': real = 0.660, user = 0.541, sys = 0.118
'|': real = 2.900, user = 1.019, sys = 3.389

100,000 lines of length 32:
'<': real = 0.684, user = 0.576, sys = 0.107
'|': real = 5.962, user = 1.560, sys = 7.809

100,000 lines of length 64:
'<': real =  0.763, user = 0.677, sys =  0.086
'|': real = 11.602, user = 2.803, sys = 15.727

100,000 lines of length 128:
'<': real =  0.984, user = 0.874, sys =  0.109
'|': real = 22.837, user = 5.061, sys = 31.592


Repeat-By:

#!/bin/bash

count=${1:-10}
file=$(mktemp /tmp/loop.XXX)
string=""

TIMEFORMAT="real = %R, user = %U, sys = %S"

for loop in {1..6}
  do
printf "\n%'d lines of length %d:\n" $count ${#string}
yes $string | sed ${count}q >$file

printf "'<': "; time while read line; do :; done <$file
printf "'|': "; time cat $file | while read line; do :; done

string+=$string
  done

rm -f $file





Re: 'while read' loop performance (redirection vs pipeline)

2020-06-10 Thread Chet Ramey
On 6/10/20 7:48 AM, Terence O'Gorman wrote:

> Bash Version: 5.0
> Patch Level: 11
> Release Status: release
> 
> 
> Description:
> 
> The 'while read' loop exhibits very different performance depending on
> whether input is via redirection or pipeline (and increasingly diverging
> performance with any increased data).  Here are some results from my
> machine (also observed in versions 3.0 and 4.0):

Two forks and one exec are expensive, as are the single-byte reads through
the pipe. The single-byte reads are required because the shell is not
allowed to read ahead in this case, since the remaining input may be
intended for another process.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/