On 3/14/12 2:14 PM, Somchai Smythe wrote: > Hello, > > I am reporting a problem with performance, not correctness. > > While preparing some examples for a course lecture where I code the > same algorithm in many languages to compare languages, I ran some code > and while it was reasonably quick with ksh, it would just apparently > hang at 100% cpu in bash. I finally let it run overnight and it does > complete correctly in bash, but what takes ksh less than a minute > takes bash 6 1/2 hours to complete (and keeping one core at 100% the > entire 6.5 hours) on the same hardware. I suspect there may be some > special way to compile bash that I don't know about that maybe works > with arrays differently, so I reporting this. I am not subscribed, so > please cc: me. I cannot use bashbug since my university blocks > outgoing mail. I used exactly the same file unmodified for the tests > in ksh and bash. My hope is that bash would be at least 'competitive' > and complete it without being more than 10x slower. As it is, I > cannot use bash in the lecture (for this) since it is only 3 hours > long and the program won't complete in that amount of time. > > BLS2 $bash --version > GNU bash, version 4.2.20(2)-release (x86_64-unknown-linux-gnu) > Copyright (C) 2011 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
[...] > The program run was a simple prime number sieve program using an array > with only 500000 elements: > > #! /bin/bash > > if ((${#}==1)) > then > n=${1} > else > n=500000 > fi > for ((i=1;i<=n;i++)) > do > ((a[i]=1)) > done > for ((i=2;i<=(n/2);i++)) > do > for ((j=2;j<=(n/i);j++)) > do > ((k=i*j)) > ((a[k]=0)) > done > done > for ((i=1;i<=n;i++)) > do > if ((a[i]!=0)) > then > printf "%d\n" ${i} > fi > done > exit 0 > > When run like: > time bash ./sieve.sh > > ..... > 499717 > 499729 > 499739 > 499747 > 499781 > 499787 > 499801 > 499819 > 499853 > 499879 > 499883 > 499897 > 499903 > 499927 > 499943 > 499957 > 499969 > 499973 > 499979 > > real 396m9.884s > user 395m43.102s > sys 0m8.913s > [...] > Computer is a Core2 Duo with 3GB of ram running a pure64bit linux > distribution with kernel 3.2.9 with gcc 4.5.3 and glibc 2.14.1. > > Both programs get the same answer, so this is not a correctness issue, > but instead a performance issue. > [...] > Experimenting a bit shows me that at 9999 elements bash is still > reasonably fast, but at 29999 elements it takes: > > real 0m39.077s > user 0m38.807s > sys 0m0.150s > > For that, ksh takes: > > real 0m1.631s > user 0m1.560s > sys 0m0.007s > > Perhaps that shorter total time still shows the problem dramatically > enough that runs that size can be used to track down the problem > without having to wait hours for the test runs. As I said in my earlier reply, the bash array implementation uses sparse doubly-linked lists. Inserts that don't append a value to the end of the array take O(N) instead of O(1), since you have to search the list to find the right spot to insert the new element. There is a lot that can be done to accommodate sequential insertion patterns, which fits the sieve algorithm pretty well. It would be pretty simple, since the array code already maintains a pointer to the last reference; starting the search for the spot to insert at the last reference should reduce the search time considerably. That makes a pretty big difference. Starting at 9999 elements, but removing the echos to minimize output: (bash) real 0m1.833s user 0m1.704s sys 0m0.118s (bash-4.2.24) real 0m2.728s user 0m2.603s sys 0m0.114s With 29999 elements: (bash) real 0m6.957s user 0m6.437s sys 0m0.377s (bash-4.2.24) real 0m16.520s user 0m16.113s sys 0m0.387s You start seeing real differences at 99999 elements (for what it's worth, loading up the array initially takes about 1.2s of that total): (bash) real 0m32.329s user 0m30.845s sys 0m1.376s (bash-4.2.24) real 2m46.972s user 2m45.095s sys 0m1.590s At this point I quit testing bash-4.2.24; I don't have *that* much time. The rest of these are just with the modified development sources, and not in a really rigorous way, since I was using the machine for other things at the time. 299999 elements: real 3m24.318s user 3m19.463s sys 0m4.768s 399999 elements: real 5m27.585s user 5m20.870s sys 0m6.511s 499999 elements: real 8m9.997s user 8m0.963s sys 0m8.553s It's never going to be O(1) unless I rewrite the whole array module, and I'm not saying that the bash builtin malloc's memory allocation patterns don't have an effect, but small changes (which I've attached) can make a big difference. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/
*** ../bash-4.2-patched/array.c 2009-03-29 22:16:43.000000000 -0400 --- array.c 2012-03-16 09:25:09.000000000 -0400 *************** *** 59,63 **** static ARRAY_ELEMENT *lastref = 0; ! #define IS_LASTREF(a) ((a) == lastarray) #define INVALIDATE_LASTREF(a) \ --- 59,67 ---- static ARRAY_ELEMENT *lastref = 0; ! #define IS_LASTREF(a) (lastarray && (a) == lastarray) ! ! #define LASTREF_START(a, i) \ ! (IS_LASTREF(a) && i >= element_index(lastref)) ? lastref \ ! : element_forw(a->head) #define INVALIDATE_LASTREF(a) \ *************** *** 611,615 **** char *v; { ! register ARRAY_ELEMENT *new, *ae; if (a == 0) --- 615,619 ---- char *v; { ! register ARRAY_ELEMENT *new, *ae, *start; if (a == 0) *************** *** 629,635 **** } /* ! * Otherwise we search for the spot to insert it. */ ! for (ae = element_forw(a->head); ae != a->head; ae = element_forw(ae)) { if (element_index(ae) == i) { /* --- 633,642 ---- } /* ! * Otherwise we search for the spot to insert it. The lastref ! * handle optimizes the case of sequential or almost-sequential ! * assignments that are not at the end of the array. */ ! start = LASTREF_START(a, i); ! for (ae = start; ae != a->head; ae = element_forw(ae)) { if (element_index(ae) == i) { /* *************** *** 648,651 **** --- 655,659 ---- } } + array_dispose_element(new); INVALIDATE_LASTREF(a); return (-1); /* problem */ *************** *** 661,669 **** arrayind_t i; { ! register ARRAY_ELEMENT *ae; if (a == 0 || array_empty(a)) return((ARRAY_ELEMENT *) NULL); ! for (ae = element_forw(a->head); ae != a->head; ae = element_forw(ae)) if (element_index(ae) == i) { ae->next->prev = ae->prev; --- 669,678 ---- arrayind_t i; { ! register ARRAY_ELEMENT *ae, *start; if (a == 0 || array_empty(a)) return((ARRAY_ELEMENT *) NULL); ! start = LASTREF_START(a, i); ! for (ae = start; ae != a->head; ae = element_forw(ae)) if (element_index(ae) == i) { ae->next->prev = ae->prev; *************** *** 672,676 **** --- 681,694 ---- if (i == array_max_index(a)) a->max_index = element_index(ae->prev); + #if 0 INVALIDATE_LASTREF(a); + #else + if (ae->next != a->head) + SET_LASTREF(a, ae->next); + else if (ae->prev != a->head) + SET_LASTREF(a, ae->prev); + else + INVALIDATE_LASTREF(a); + #endif return(ae); } *************** *** 686,701 **** arrayind_t i; { ! register ARRAY_ELEMENT *ae; if (a == 0 || array_empty(a)) return((char *) NULL); if (i > array_max_index(a)) ! return((char *)NULL); ! /* Keep roving pointer into array to optimize sequential access */ ! if (lastref && IS_LASTREF(a)) ! ae = (i >= element_index(lastref)) ? lastref : element_forw(a->head); ! else ! ae = element_forw(a->head); ! for ( ; ae != a->head; ae = element_forw(ae)) if (element_index(ae) == i) { SET_LASTREF(a, ae); --- 704,715 ---- arrayind_t i; { ! register ARRAY_ELEMENT *ae, *start; if (a == 0 || array_empty(a)) return((char *) NULL); if (i > array_max_index(a)) ! return((char *)NULL); /* Keep roving pointer into array to optimize sequential access */ ! start = LASTREF_START(a, i); ! for (ae = start; ae != a->head; ae = element_forw(ae)) if (element_index(ae) == i) { SET_LASTREF(a, ae);