contents of whole arrays dissapears leaving while read loop
Config info 1: Configuration Information [Automatically generated, do not change]: Machine: i686 OS: linux-gnu Compiler: i686-pc-linux-gnu-gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i686' -DCONF_OSTYPE='linu x-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu' -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/ share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I ./lib -DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/ sbin:/bin' -DSTANDARD_UTILS_PATH='/bin:/usr/bin:/sbin:/usr/sbin' -DSYS_BASHRC='/ etc/bash/bashrc' -DSYS_BASH_LOGOUT='/etc/bash/bash_logout' -DNON_INTERACTIVE_LOG IN_SHELLS -DSSH_SOURCE_BASHRC -march=athlon-xp -O2 -pipe -fomit-frame-pointer -g 3 uname output: Linux dragoda.com 2.6.28-gentoo-r2 #1 SMP Sat Feb 28 19:17:31 CET 2009 i686 AMD Sempron(tm) 3000+ AuthenticAMD GNU/Linux Machine Type: i686-pc-linux-gnu Bash Version: 4.0 Patch Level: 10 Release Status: release Config info 2: Configuration Information [Automatically generated, do not change]: Machine: powerpc OS: aix6.1.0.0 Compiler: cc -qlanglvl=extc89 Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='powerpc' -DCONF_OSTYPE='a ix6.1.0.0' -DCONF_MACHTYPE='powerpc-ibm-aix6.1.0.0' -DCONF_VENDOR='ibm' -DLOCALE DIR='/usr/local/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. - I. -I./include -I./lib -I./lib/intl -I/home/lesc/src/bash/bash-4.0/lib/intl -g uname output: AIX fls-nim1 1 6 0001B6BAD300 Machine Type: powerpc-ibm-aix6.1.0.0 Bash Version: 4.0 Patch Level: 10 Release Status: release Description: In the construct cat file|while read line do done the content of any arry assignments in the loop dissapears leaving the loop: Repeat-By: Having a file named 'datafile' which have the tree lines: one two tree the bash code declare -A numbers counter=0 numbers[zero]=0 cat datafile|while read number do counter=$((counter+1)) numbers[$number]=$counter echo "${numbe...@]}" done echo "${numbe...@]}" will generate the following output: 1 0 1 0 2 1 0 2 3 0 while the similar code declare -A numbers counter=0 numbers[zero]=0 while read number do counter=$((counter+1)) numbers[$number]=$counter echo "${numbe...@]}" done < datafile echo "${numbe...@]}" will generate the (correct) output: 1 0 1 0 2 1 0 2 3 1 0 2 3 The first - but buggy - form will normally be prefrred while you often need a filter like egrep -v '^#' in front of the while loop -- Regards, Lennart Schultz
using mapfile is extreamly slow compared to oldfashinod ways to read files
Configuration Information [Automatically generated, do not change]: Machine: i686 OS: linux-gnu Compiler: i686-pc-linux-gnu-gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i686' -DCONF_OSTYPE='linu x-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu' -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/ share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I ./lib -DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/ sbin:/bin' -DSTANDARD_UTILS_PATH='/bin:/usr/bin:/sbin:/usr/sbin' -DSYS_BASHRC='/ etc/bash/bashrc' -DSYS_BASH_LOGOUT='/etc/bash/bash_logout' -DNON_INTERACTIVE_LOG IN_SHELLS -DSSH_SOURCE_BASHRC -march=athlon-xp -O2 -pipe -fomit-frame-pointer -g 3 uname output: Linux dragoda.com 2.6.28-gentoo-r2 #1 SMP Sat Feb 28 19:17:31 CET 2009 i686 AMD Sempron(tm) 3000+ AuthenticAMD GNU/Linux Machine Type: i686-pc-linux-gnu Bash Version: 4.0 Patch Level: 10 Release Status: release Description: I have a bash script which reads about 25 lines of xml code generating about 850 files with information extracted from the xml file. It uses the construct: while read line do case "$line" in done < file and this takes a little less than 2 minutes Trying to use mapfile I changed the above construct to: mapfile < file for i in "${mapfi...@]}" do line=$(echo $i) # strip leading blanks case "$line" in done With this change the job now takes more than 48 minutes. :( It may be that I am new to mapfiles, and there are more efficient ways to traverse a mapfile array, but if this the case please document it. Another suggestion for mapfile: please introduce an option to strip leading blanks so mapfile acts like readline so constructions like: line=$(echo $i) # strip leading blanks above can be avoid. -- Regards, Lennart Schultz
Re: using mapfile is extreamly slow compared to oldfashinod ways to read files
Chris, I agree with you to use the right tool at the right time, and mapfile seems not to be the right tool for my problem, but I will just give you some facts of my observations: using a fast tool like egrep just to find a simple string in my datafile gives the following times: time egrep '/dev/null < dr.xml real0m54.628s user0m27.310s sys 0m0.036s My original bash script : time xml2e2-loadepg real1m53.264s user1m22.145s sys 0m30.674s While the questions seems to go on spawning subshells and the cost I have checked my script it is only calling one external command is date which in total is called a little less than 2 times. I have just for this test changed the call of date to an assignment of an constant. and now it looks: time xml2e2-loadepg real1m3.826s user1m2.700s sys 0m1.004s I also made the same change to the version of the program using mapfile, and changed line=$(echo $i) to line=${i##+([[:space:]])} so the mainloop is absolulty without any sub shell spawns: time xml2e2-loadepg.new real65m2.378s user63m16.717s sys 0m1.124s Lennart 2009/3/26 Chris F.A. Johnson > On Thu, 26 Mar 2009, Lennart Schultz wrote: > > I have a bash script which reads about 25 lines of xml code generating >> about 850 files with information extracted from the xml file. >> It uses the construct: >> >> while read line >> do >> case "$line" in >> >> done < file >> >> and this takes a little less than 2 minutes >> >> Trying to use mapfile I changed the above construct to: >> >> mapfile < file >> for i in "${mapfi...@]}" >> do >> line=$(echo $i) # strip leading blanks >> case "$line" in >> >> done >> >> With this change the job now takes more than 48 minutes. :( >> > > As has already been suggested, the time it almost certainly taken > up in the command substitution which you perform on every line. > > If you want to remove leading spaces, it would be better to use a > single command to do that before reading with mapfile, e,g,: > > mapfile < <(sed 's/^ *//' file) > > If you want to remove trailing spaces as well: > > mapfile < <(sed -e 's/^ *//' -e 's/ *$//' file) > > Chet, how about an option to mapfile that strips leading and/or > trailing spaces? > > Another useful option would be to remove newlines. > > -- > Chris F.A. Johnson, webmaster <http://woodbine-gerrard.com> > = Do not reply to the From: address; use Reply-To: > Author: > Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress) >
Re: using mapfile is extreamly slow compared to oldfashinod ways to read files
It seems that mapfile is OK for small numbers but for bigger numbers it starts to compsume time. I made a litle test: rm Xyz; unset MAPFILE # clear max= # set limit time for i in $(seq 0 $max); do echo 'Xyz' >> Xyz; done real0m0.490s user0m0.304s sys 0m0.124s time mapfile < Xyz real0m0.005s user0m0.008s sys 0m0.000s time while read line; do echo $line > /dev/null; done < Xyz real0m1.124s user0m0.456s sys 0m0.108s time for i in $(seq 0 $max); do echo echo ${MAPFILE[$i]}> /dev/null; done real0m2.184s user0m0.976s sys 0m0.104s rm Xyz ;unset MAPFILE max=9 time for i in $(seq 0 $max); do echo 'Xyz' >> Xyz; done real0m8.204s user0m3.264s sys 0m1.188s time mapfile < Xyz real0m0.062s user0m0.044s sys 0m0.000s time while read line; do echo $line > /dev/null; done < Xyz real0m11.328s user0m4.500s sys 0m1.140s time for i in $(seq 0 $max); do echo echo ${MAPFILE[$i]}> /dev/null; done real9m52.832s user5m38.305s sys 0m3.636s At the time of testing I had sufficient of free memory no swapping, and no othe time compsuming programs. 2009/3/28 Chris F.A. Johnson > On Fri, 27 Mar 2009, Lennart Schultz wrote: > > Chris, >> I agree with you to use the right tool at the right time, and mapfile >> seems >> not to be the right tool for my problem, but I will just give you some >> facts >> of my observations: >> >> using a fast tool like egrep just to find a simple string in my datafile >> gives the following times: >> >> time egrep '/dev/null < dr.xml >> >> real0m54.628s >> user0m27.310s >> sys 0m0.036s >> >> My original bash script : >> >> time xml2e2-loadepg >> >> real1m53.264s >> user1m22.145s >> sys 0m30.674s >> >> While the questions seems to go on spawning subshells and the cost I have >> checked my script >> it is only calling one external command is date which in total is called a >> little less than 2 times. I have just for this test changed the call >> of >> date to an assignment of an constant. and now it looks: >> >> time xml2e2-loadepg >> >> real1m3.826s >> user1m2.700s >> sys 0m1.004s >> >> I also made the same change to the version of the program using mapfile, >> and >> changed line=$(echo $i) to >> line=${i##+([[:space:]])} >> so the mainloop is absolulty without any sub shell spawns: >> >> time xml2e2-loadepg.new >> >> real65m2.378s >> user63m16.717s >> sys 0m1.124s >> > > How much of that is taken by mapfile? Time the mapfile command and > the loop separately: > > time mapfile < file > time for i in "${mapfi...@]}" > > -- > Chris F.A. Johnson, webmaster <http://woodbine-gerrard.com> > === > > Author: > Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress) >