Re: bug: bash 4.2.20 impossibly slow

Somchai Smythe Sun, 18 Mar 2012 13:57:50 -0700

On 3/16/12, Chet Ramey <chet.ra...@case.edu> wrote:
> On 3/14/12 2:14 PM, Somchai Smythe wrote:
>> Hello,
>>
>> I am reporting a problem with performance, not correctness.
>>
>> While preparing some examples for a course lecture where I code the
>> same algorithm in many languages to compare languages, I ran some code
>> and while it was  reasonably quick with ksh, it would just apparently
>> hang at 100% cpu in bash.  I finally let it run overnight and it does
>> complete correctly in bash, but what takes ksh less than a minute
>> takes bash 6 1/2 hours to complete (and keeping one core at 100% the
>> entire 6.5 hours) on the same hardware.  I suspect there may be some
>> special way to compile bash that I don't know about that maybe works
>> with arrays differently, so I reporting this.  I am not subscribed, so
>> please cc: me.  I cannot use bashbug since my university blocks
>> outgoing mail.  I used exactly the same file unmodified for the tests
>> in ksh and bash.  My hope is that bash would be at least 'competitive'
>> and complete it without being more than 10x slower.  As it is, I
>> cannot use bash in the lecture (for this) since it is only 3 hours
>> long and the program won't complete in that amount of time.
>>
>> BLS2 $bash --version
>> GNU bash, version 4.2.20(2)-release (x86_64-unknown-linux-gnu)
>> Copyright (C) 2011 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> <http://gnu.org/licenses/gpl.html>
>
>       [...]
>
>> The program run was a simple prime number sieve program using an array
>> with only 500000 elements:
>>
>> #! /bin/bash
>>
>> if ((${#}==1))
>> then
>>   n=${1}
>> else
>>   n=500000
>> fi
>> for ((i=1;i<=n;i++))
>> do
>>   ((a[i]=1))
>> done
>> for ((i=2;i<=(n/2);i++))
>> do
>>   for ((j=2;j<=(n/i);j++))
>>   do
>>      ((k=i*j))
>>      ((a[k]=0))
>>   done
>> done
>> for ((i=1;i<=n;i++))
>> do
>>   if ((a[i]!=0))
>>   then
>>     printf "%d\n" ${i}
>>   fi
>> done
>> exit 0
>>
>> When run like:
>> time bash ./sieve.sh
>>
>> .....
>> 499717
>> 499729
>> 499739
>> 499747
>> 499781
>> 499787
>> 499801
>> 499819
>> 499853
>> 499879
>> 499883
>> 499897
>> 499903
>> 499927
>> 499943
>> 499957
>> 499969
>> 499973
>> 499979
>>
>> real    396m9.884s
>> user    395m43.102s
>> sys     0m8.913s
>>
>       [...]
>
>> Computer is a Core2 Duo with 3GB of ram running a pure64bit linux
>> distribution with kernel 3.2.9 with gcc 4.5.3 and glibc 2.14.1.
>>
>> Both programs get the same answer, so this is not a correctness issue,
>> but instead a performance issue.
>>
>       [...]
>
>> Experimenting a bit shows me that at 9999 elements bash is still
>> reasonably fast, but at 29999 elements it takes:
>>
>> real    0m39.077s
>> user    0m38.807s
>> sys     0m0.150s
>>
>> For that, ksh takes:
>>
>> real    0m1.631s
>> user    0m1.560s
>> sys     0m0.007s
>>
>> Perhaps that shorter total time still shows the problem dramatically
>> enough that runs that size can be used to track down the problem
>> without having to wait hours for the test runs.
>
> As I said in my earlier reply, the bash array implementation uses sparse
> doubly-linked lists.  Inserts that don't append a value to the end of
> the array take O(N) instead of O(1), since you have to search the list
> to find the right spot to insert the new element.
>
> There is a lot that can be done to accommodate sequential insertion
> patterns, which fits the sieve algorithm pretty well.  It would be
> pretty simple, since the array code already maintains a pointer to the
> last reference; starting the search for the spot to insert at the last
> reference should reduce the search time considerably.
>
> That makes a pretty big difference.  Starting at 9999 elements, but
> removing the echos to minimize output:
>
> (bash)
> real  0m1.833s
> user  0m1.704s
> sys   0m0.118s
>
> (bash-4.2.24)
> real  0m2.728s
> user  0m2.603s
> sys   0m0.114s
>
> With 29999 elements:
> (bash)
> real  0m6.957s
> user  0m6.437s
> sys   0m0.377s
>
> (bash-4.2.24)
> real  0m16.520s
> user  0m16.113s
> sys   0m0.387s
>
> You start seeing real differences at 99999 elements (for what it's worth,
> loading up the array initially takes about 1.2s of that total):
>
> (bash)
> real  0m32.329s
> user  0m30.845s
> sys   0m1.376s
>
> (bash-4.2.24)
> real  2m46.972s
> user  2m45.095s
> sys   0m1.590s
>
> At this point I quit testing bash-4.2.24; I don't have *that* much time.
> The rest of these are just with the modified development sources, and not
> in a really rigorous way, since I was using the machine for other things
> at the time.
>
> 299999 elements:
> real  3m24.318s
> user  3m19.463s
> sys   0m4.768s
>
> 399999 elements:
> real  5m27.585s
> user  5m20.870s
> sys   0m6.511s
>
> 499999 elements:
> real  8m9.997s
> user  8m0.963s
> sys   0m8.553s
>
> It's never going to be O(1) unless I rewrite the whole array module, and
> I'm not saying that the bash builtin malloc's memory allocation patterns
> don't have an effect, but small changes (which I've attached) can make a
> big difference.
>
> Chet


Thank you very much for the patch; it makes a very dramatic difference
for this use case.  In my testing, it went from 6.5 hours to just over
20 minutes.

I guess this wasn't the default since it may not be great for random
access, but I hope that it is at least available as a compile-time
option in future official bash releases.  For now, I'll apply this
array access speedup patch to our packages.

John Gatewood Ham
Lecturer
Burapha University
Bang Saen, Chonburi
Thailand

Re: bug: bash 4.2.20 impossibly slow

Reply via email to