Lennart Schultz wrote:
> Bash Version: 4.0
> Patch Level: 10
> Release Status: release
>
> Description:
>
> I have a bash script which reads about 250000 lines of xml code generating
> about 850 files with information extracted from the xml file.
> It uses the construct:
>
> while read line
> do
> case "$line" in
> ....
> done < file
>
> and this takes a little less than 2 minutes
>
> Trying to use mapfile I changed the above construct to:
>
> mapfile < file
> for i in "${mapfi...@]}"
> do
> line=$(echo $i) # strip leading blanks
> case "$line" in
> ....
> done
>
> With this change the job now takes more than 48 minutes. :(
The most important thing is using the right tool for the job. If you
have to introduce a command substitution for each line read with mapfile,
you probably don't have the problem mapfile is intended to solve:
quickly reading exact copies of lines from a file descriptor into an
array.
If another approach works better, you should use it.
If you're interested in why the mapfile solution is slower, you could
run the loop using a version of bash built for profiling and check
where the time goes. I believe you'd find that the command substitution
is responsible for much of it, and the rest is due to the significant
increase in memory usage resulting from the 250000-line array (which
also slows down fork and process creation).
Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
Chet Ramey, ITS, CWRU [email protected] http://cnswww.cns.cwru.edu/~chet/