contents of whole arrays dissapears leaving while read loop

2009-03-26 Thread Lennart Schultz
Config info 1:
Configuration Information [Automatically generated, do not change]:
Machine: i686
OS: linux-gnu
Compiler: i686-pc-linux-gnu-gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i686'
-DCONF_OSTYPE='linu
x-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu' -DCONF_VENDOR='pc'
-DLOCALEDIR='/usr/
share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H   -I.  -I.
-I./include -I
./lib
-DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/
sbin:/bin' -DSTANDARD_UTILS_PATH='/bin:/usr/bin:/sbin:/usr/sbin'
-DSYS_BASHRC='/
etc/bash/bashrc' -DSYS_BASH_LOGOUT='/etc/bash/bash_logout'
-DNON_INTERACTIVE_LOG
IN_SHELLS -DSSH_SOURCE_BASHRC -march=athlon-xp -O2 -pipe
-fomit-frame-pointer -g
3
uname output: Linux dragoda.com 2.6.28-gentoo-r2 #1 SMP Sat Feb 28 19:17:31
CET
2009 i686 AMD Sempron(tm) 3000+ AuthenticAMD GNU/Linux
Machine Type: i686-pc-linux-gnu

Bash Version: 4.0
Patch Level: 10
Release Status: release

Config info 2:
Configuration Information [Automatically generated, do not change]:
Machine: powerpc
OS: aix6.1.0.0
Compiler: cc -qlanglvl=extc89
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='powerpc'
-DCONF_OSTYPE='a
ix6.1.0.0' -DCONF_MACHTYPE='powerpc-ibm-aix6.1.0.0' -DCONF_VENDOR='ibm'
-DLOCALE
DIR='/usr/local/share/locale' -DPACKAGE='bash' -DSHELL  -DHAVE_CONFIG_H
-I.  -
I. -I./include -I./lib -I./lib/intl -I/home/lesc/src/bash/bash-4.0/lib/intl
-g
uname output: AIX fls-nim1 1 6 0001B6BAD300
Machine Type: powerpc-ibm-aix6.1.0.0

Bash Version: 4.0
Patch Level: 10
Release Status: release

Description:
In the construct
cat file|while read line
do
done
the content of any arry assignments in the loop dissapears leaving the loop:


Repeat-By:

Having a file named 'datafile' which have the tree lines:
one
two
tree

the bash code

declare -A numbers
counter=0
numbers[zero]=0
cat datafile|while read number
do
   counter=$((counter+1))
   numbers[$number]=$counter
   echo "${numbe...@]}"
done
echo "${numbe...@]}"

will generate the following output:
1 0
1 0 2
1 0 2 3
0

while the similar code

declare -A numbers
counter=0
numbers[zero]=0
while read number
do
   counter=$((counter+1))
   numbers[$number]=$counter
   echo "${numbe...@]}"
done < datafile
echo "${numbe...@]}"

will generate the (correct) output:

1 0
1 0 2
1 0 2 3
1 0 2 3

The first - but buggy - form will normally be prefrred while you often need
a filter like
egrep -v '^#'
in front of the while loop



-- 
Regards,
Lennart Schultz


using mapfile is extreamly slow compared to oldfashinod ways to read files

2009-03-26 Thread Lennart Schultz
Configuration Information [Automatically generated, do not change]:
Machine: i686
OS: linux-gnu
Compiler: i686-pc-linux-gnu-gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i686'
-DCONF_OSTYPE='linu
x-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu' -DCONF_VENDOR='pc'
-DLOCALEDIR='/usr/
share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H   -I.  -I.
-I./include -I
./lib
-DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/
sbin:/bin' -DSTANDARD_UTILS_PATH='/bin:/usr/bin:/sbin:/usr/sbin'
-DSYS_BASHRC='/
etc/bash/bashrc' -DSYS_BASH_LOGOUT='/etc/bash/bash_logout'
-DNON_INTERACTIVE_LOG
IN_SHELLS -DSSH_SOURCE_BASHRC -march=athlon-xp -O2 -pipe
-fomit-frame-pointer -g
3
uname output: Linux dragoda.com 2.6.28-gentoo-r2 #1 SMP Sat Feb 28 19:17:31
CET
2009 i686 AMD Sempron(tm) 3000+ AuthenticAMD GNU/Linux
Machine Type: i686-pc-linux-gnu

Bash Version: 4.0
Patch Level: 10
Release Status: release

Description:

I have a bash script which reads about 25 lines of xml code generating
about 850 files with information extracted from the xml file.
It uses the construct:

while read line
do
   case "$line" in
   
done < file

and this takes a little less than 2 minutes

Trying to use mapfile I changed the above construct to:

mapfile  < file
for i in "${mapfi...@]}"
do
   line=$(echo $i) # strip leading blanks
   case "$line" in
   
done

With this change the job now takes more than 48 minutes. :(

It may be that I am new to mapfiles, and there are more efficient ways to
traverse a mapfile array, but if this the case please document it.

Another suggestion for mapfile:
please introduce an option to strip leading blanks so mapfile acts like
readline so constructions like:
line=$(echo $i) # strip leading blanks
above can be avoid.

-- 
Regards,
Lennart Schultz


Re: using mapfile is extreamly slow compared to oldfashinod ways to read files

2009-03-27 Thread Lennart Schultz
Chris,
I agree with you to use the right tool at the right time, and mapfile seems
not to be the right tool for my problem, but I will just give you some facts
of my observations:

using a fast tool like egrep just to find a simple string in my datafile
gives the following times:

time egrep '/dev/null < dr.xml

real0m54.628s
user0m27.310s
sys 0m0.036s

My original bash script :

time xml2e2-loadepg

real1m53.264s
user1m22.145s
sys 0m30.674s

While the questions seems to go on spawning subshells and the cost I have
checked my script
it is only calling one external command is date which in total is called a
little less than 2 times. I have just for this test changed the call of
date to an assignment of an constant. and now it looks:

time xml2e2-loadepg

real1m3.826s
user1m2.700s
sys 0m1.004s

I also made the same change to the version of the program using mapfile, and
changed  line=$(echo $i) to
line=${i##+([[:space:]])}
so the mainloop is absolulty without any sub shell spawns:

time xml2e2-loadepg.new

real65m2.378s
user63m16.717s
sys 0m1.124s



Lennart


2009/3/26 Chris F.A. Johnson 

> On Thu, 26 Mar 2009, Lennart Schultz wrote:
>
>  I have a bash script which reads about 25 lines of xml code generating
>> about 850 files with information extracted from the xml file.
>> It uses the construct:
>>
>> while read line
>> do
>>  case "$line" in
>>  
>> done < file
>>
>> and this takes a little less than 2 minutes
>>
>> Trying to use mapfile I changed the above construct to:
>>
>> mapfile  < file
>> for i in "${mapfi...@]}"
>> do
>>  line=$(echo $i) # strip leading blanks
>>  case "$line" in
>>  
>> done
>>
>> With this change the job now takes more than 48 minutes. :(
>>
>
>   As has already been suggested, the time it almost certainly taken
>   up in the command substitution which you perform on every line.
>
>   If you want to remove leading spaces, it would be better to use a
>   single command to do that before reading with mapfile, e,g,:
>
> mapfile < <(sed 's/^ *//' file)
>
>   If you want to remove trailing spaces as well:
>
> mapfile < <(sed -e 's/^ *//' -e 's/ *$//' file)
>
>   Chet, how about an option to mapfile that strips leading and/or
>   trailing spaces?
>
>   Another useful option would be to remove newlines.
>
> --
>   Chris F.A. Johnson, webmaster <http://woodbine-gerrard.com>
>   = Do not reply to the From: address; use Reply-To: 
>   Author:
>   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
>


Re: using mapfile is extreamly slow compared to oldfashinod ways to read files

2009-03-28 Thread Lennart Schultz
It seems that mapfile is OK for small numbers but for bigger numbers it
starts to compsume time.

I made a litle test:

rm Xyz; unset MAPFILE # clear
max=  # set limit
time for i in $(seq 0 $max); do echo 'Xyz' >> Xyz; done
real0m0.490s
user0m0.304s
sys 0m0.124s

 time mapfile < Xyz

real0m0.005s
user0m0.008s
sys 0m0.000s

time while read line; do echo $line > /dev/null; done < Xyz
real0m1.124s
user0m0.456s
sys 0m0.108s

time for i in $(seq 0 $max); do echo echo ${MAPFILE[$i]}> /dev/null; done

real0m2.184s
user0m0.976s
sys 0m0.104s

rm Xyz ;unset MAPFILE
max=9

 time for i in $(seq 0 $max); do echo 'Xyz' >> Xyz; done

real0m8.204s
user0m3.264s
sys 0m1.188s

time mapfile < Xyz

real0m0.062s
user0m0.044s
sys 0m0.000s

time while read line; do echo $line > /dev/null; done < Xyz
real0m11.328s
user0m4.500s
sys 0m1.140s

time for i in $(seq 0 $max); do echo echo ${MAPFILE[$i]}> /dev/null; done

real9m52.832s
user5m38.305s
sys 0m3.636s


At the time of testing I had sufficient of free memory no swapping, and no
othe time compsuming programs.


2009/3/28 Chris F.A. Johnson 

> On Fri, 27 Mar 2009, Lennart Schultz wrote:
>
>  Chris,
>> I agree with you to use the right tool at the right time, and mapfile
>> seems
>> not to be the right tool for my problem, but I will just give you some
>> facts
>> of my observations:
>>
>> using a fast tool like egrep just to find a simple string in my datafile
>> gives the following times:
>>
>> time egrep '/dev/null < dr.xml
>>
>> real0m54.628s
>> user0m27.310s
>> sys 0m0.036s
>>
>> My original bash script :
>>
>> time xml2e2-loadepg
>>
>> real1m53.264s
>> user1m22.145s
>> sys 0m30.674s
>>
>> While the questions seems to go on spawning subshells and the cost I have
>> checked my script
>> it is only calling one external command is date which in total is called a
>> little less than 2 times. I have just for this test changed the call
>> of
>> date to an assignment of an constant. and now it looks:
>>
>> time xml2e2-loadepg
>>
>> real1m3.826s
>> user1m2.700s
>> sys 0m1.004s
>>
>> I also made the same change to the version of the program using mapfile,
>> and
>> changed  line=$(echo $i) to
>> line=${i##+([[:space:]])}
>> so the mainloop is absolulty without any sub shell spawns:
>>
>> time xml2e2-loadepg.new
>>
>> real65m2.378s
>> user63m16.717s
>> sys 0m1.124s
>>
>
>   How much of that is taken by mapfile? Time the mapfile command and
>   the loop separately:
>
> time mapfile < file
> time for i in "${mapfi...@]}"
>
> --
>   Chris F.A. Johnson, webmaster <http://woodbine-gerrard.com>
>   ===
>
>   Author:
>   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
>