On 3/16/12, Chet Ramey <chet.ra...@case.edu> wrote: > On 3/14/12 2:14 PM, Somchai Smythe wrote: >> Hello, >> >> I am reporting a problem with performance, not correctness. >> >> While preparing some examples for a course lecture where I code the >> same algorithm in many languages to compare languages, I ran some code >> and while it was reasonably quick with ksh, it would just apparently >> hang at 100% cpu in bash. I finally let it run overnight and it does >> complete correctly in bash, but what takes ksh less than a minute >> takes bash 6 1/2 hours to complete (and keeping one core at 100% the >> entire 6.5 hours) on the same hardware. I suspect there may be some >> special way to compile bash that I don't know about that maybe works >> with arrays differently, so I reporting this. I am not subscribed, so >> please cc: me. I cannot use bashbug since my university blocks >> outgoing mail. I used exactly the same file unmodified for the tests >> in ksh and bash. My hope is that bash would be at least 'competitive' >> and complete it without being more than 10x slower. As it is, I >> cannot use bash in the lecture (for this) since it is only 3 hours >> long and the program won't complete in that amount of time. >> >> BLS2 $bash --version >> GNU bash, version 4.2.20(2)-release (x86_64-unknown-linux-gnu) >> Copyright (C) 2011 Free Software Foundation, Inc. >> License GPLv3+: GNU GPL version 3 or later >> <http://gnu.org/licenses/gpl.html> > > [...] > >> The program run was a simple prime number sieve program using an array >> with only 500000 elements: >> >> #! /bin/bash >> >> if ((${#}==1)) >> then >> n=${1} >> else >> n=500000 >> fi >> for ((i=1;i<=n;i++)) >> do >> ((a[i]=1)) >> done >> for ((i=2;i<=(n/2);i++)) >> do >> for ((j=2;j<=(n/i);j++)) >> do >> ((k=i*j)) >> ((a[k]=0)) >> done >> done >> for ((i=1;i<=n;i++)) >> do >> if ((a[i]!=0)) >> then >> printf "%d\n" ${i} >> fi >> done >> exit 0 >> >> When run like: >> time bash ./sieve.sh >> >> ..... >> 499717 >> 499729 >> 499739 >> 499747 >> 499781 >> 499787 >> 499801 >> 499819 >> 499853 >> 499879 >> 499883 >> 499897 >> 499903 >> 499927 >> 499943 >> 499957 >> 499969 >> 499973 >> 499979 >> >> real 396m9.884s >> user 395m43.102s >> sys 0m8.913s >> > [...] > >> Computer is a Core2 Duo with 3GB of ram running a pure64bit linux >> distribution with kernel 3.2.9 with gcc 4.5.3 and glibc 2.14.1. >> >> Both programs get the same answer, so this is not a correctness issue, >> but instead a performance issue. >> > [...] > >> Experimenting a bit shows me that at 9999 elements bash is still >> reasonably fast, but at 29999 elements it takes: >> >> real 0m39.077s >> user 0m38.807s >> sys 0m0.150s >> >> For that, ksh takes: >> >> real 0m1.631s >> user 0m1.560s >> sys 0m0.007s >> >> Perhaps that shorter total time still shows the problem dramatically >> enough that runs that size can be used to track down the problem >> without having to wait hours for the test runs. > > As I said in my earlier reply, the bash array implementation uses sparse > doubly-linked lists. Inserts that don't append a value to the end of > the array take O(N) instead of O(1), since you have to search the list > to find the right spot to insert the new element. > > There is a lot that can be done to accommodate sequential insertion > patterns, which fits the sieve algorithm pretty well. It would be > pretty simple, since the array code already maintains a pointer to the > last reference; starting the search for the spot to insert at the last > reference should reduce the search time considerably. > > That makes a pretty big difference. Starting at 9999 elements, but > removing the echos to minimize output: > > (bash) > real 0m1.833s > user 0m1.704s > sys 0m0.118s > > (bash-4.2.24) > real 0m2.728s > user 0m2.603s > sys 0m0.114s > > With 29999 elements: > (bash) > real 0m6.957s > user 0m6.437s > sys 0m0.377s > > (bash-4.2.24) > real 0m16.520s > user 0m16.113s > sys 0m0.387s > > You start seeing real differences at 99999 elements (for what it's worth, > loading up the array initially takes about 1.2s of that total): > > (bash) > real 0m32.329s > user 0m30.845s > sys 0m1.376s > > (bash-4.2.24) > real 2m46.972s > user 2m45.095s > sys 0m1.590s > > At this point I quit testing bash-4.2.24; I don't have *that* much time. > The rest of these are just with the modified development sources, and not > in a really rigorous way, since I was using the machine for other things > at the time. > > 299999 elements: > real 3m24.318s > user 3m19.463s > sys 0m4.768s > > 399999 elements: > real 5m27.585s > user 5m20.870s > sys 0m6.511s > > 499999 elements: > real 8m9.997s > user 8m0.963s > sys 0m8.553s > > It's never going to be O(1) unless I rewrite the whole array module, and > I'm not saying that the bash builtin malloc's memory allocation patterns > don't have an effect, but small changes (which I've attached) can make a > big difference. > > Chet
Thank you very much for the patch; it makes a very dramatic difference for this use case. In my testing, it went from 6.5 hours to just over 20 minutes. I guess this wasn't the default since it may not be great for random access, but I hope that it is at least available as a compile-time option in future official bash releases. For now, I'll apply this array access speedup patch to our packages. John Gatewood Ham Lecturer Burapha University Bang Saen, Chonburi Thailand