Re: using mapfile is extreamly slow compared to oldfashinod ways to read files

Chet Ramey Thu, 26 Mar 2009 13:52:36 -0700

Lennart Schultz wrote:

> Bash Version: 4.0
> Patch Level: 10
> Release Status: release
> 
> Description:
> 
> I have a bash script which reads about 250000 lines of xml code generating
> about 850 files with information extracted from the xml file.
> It uses the construct:
> 
> while read line
> do
>    case "$line" in
>    ....
> done < file
> 
> and this takes a little less than 2 minutes
> 
> Trying to use mapfile I changed the above construct to:
> 
> mapfile  < file
> for i in "${mapfi...@]}"
> do
>    line=$(echo $i) # strip leading blanks
>    case "$line" in
>    ....
> done
> 
> With this change the job now takes more than 48 minutes. :(

The most important thing is using the right tool for the job.  If you
have to introduce a command substitution for each line read with mapfile,
you probably don't have the problem mapfile is intended to solve:
quickly reading exact copies of lines from a file descriptor into an
array.

If another approach works better, you should use it.

If you're interested in why the mapfile solution is slower, you could
run the loop using a version of bash built for profiling and check
where the time goes.  I believe you'd find that the command substitution
is responsible for much of it, and the rest is due to the significant
increase in memory usage resulting from the 250000-line array (which
also slows down fork and process creation).

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer

Chet Ramey, ITS, CWRU    c...@case.edu    http://cnswww.cns.cwru.edu/~chet/

Re: using mapfile is extreamly slow compared to oldfashinod ways to read files

Reply via email to