Thread starvation and resource saturation in atomicity functions?
Hello all. Late last year I posted a couple of questions about multi-threaded application hangs in Solaris 10 for x86 platforms, and about thread-safety of std::basic_string in general. This was an attempt to solve persistent problems I have been experiencing with my application hanging due to CPU utilization shooting to 100%, with the __gnu_cxx::__exchange_and_add function frequently making appearances at the top of the stack trace of several threads. I believe I have made a break-through recently and wanted to solicit the opinion of some experts on this. I seem to have narrowed the problem down to running my application as root versus an unprivileged user, and further isolated the suspected cause to varying thread priorities in my application. I have theorized that spin-locks in gcc, particularly in the atomicity __gnu_cxx::__exchange_and_add function, are causing higher priority threads to consume all available cpu cycles while spinning indefinitely waiting for a lower priority thread that holds the lock. Now I am already aware that messing with thread priorities is dangerous and often an excercise in futility, but I am surprised that something so elemental as an atomic test-and-set operation that may be used extensively throughout gcc could possibly be the culprit for all of the trouble I have been experiencing. More than anything I'm hoping for a sanity check on this, even if it's just to confirm what may be obvious to others; that modifying thread priorities is strictly off-limits except in extreme circumstances with careful control over what operations are performed. Or perhaps there's another solution that has eluded my searches, maybe a bug fix or some way of avoiding such spin-locks in gcc making varying thread-priorities viable and safe. Thanks in advance for any insight, and at the very least I hope that this will serve as a warning to others who might find themselves in the same situation. Cheers, Chad Attermann
Re: Thread starvation and resource saturation in atomicity functions?
"Ian Lance Taylor" <[EMAIL PROTECTED]> writes: "Chad Attermann" <[EMAIL PROTECTED]> writes: Hello all. Late last year I posted a couple of questions about multi-threaded application hangs in Solaris 10 for x86 platforms, and about thread-safety of std::basic_string in general. This was an attempt to solve persistent problems I have been experiencing with my application hanging due to CPU utilization shooting to 100%, with the __gnu_cxx::__exchange_and_add function frequently making appearances at the top of the stack trace of several threads. I believe I have made a break-through recently and wanted to solicit the opinion of some experts on this. I seem to have narrowed the problem down to running my application as root versus an unprivileged user, and further isolated the suspected cause to varying thread priorities in my application. I have theorized that spin-locks in gcc, particularly in the atomicity __gnu_cxx::__exchange_and_add function, are causing higher priority threads to consume all available cpu cycles while spinning indefinitely waiting for a lower priority thread that holds the lock. Now I am already aware that messing with thread priorities is dangerous and often an excercise in futility, but I am surprised that something so elemental as an atomic test-and-set operation that may be used extensively throughout gcc could possibly be the culprit for all of the trouble I have been experiencing. You explicitly mentioned x86. For x86, __gnu_cxx::__exchange_and_add does not use a spin-lock. If you mean that other code may use spin locks built on top of __exchange_and_add, then, yes, in that case you could be getting a priority inversion. But gcc itself does not use any such code. So if you are seeing a problem of this sort, it is not a problem with gcc. I doubted at first too, but from what I can tell, the version of gcc that ships with Solaris 10 x86 is gcc 3.4.3 for i386. Below is the output of the pre-installed gcc given the -v switch: Reading specs from /usr/sfw/lib/gcc/i386-pc-solaris2.10/3.4.3/specs Configured with: /builds/sfw10-gate/usr/src/cmd/gcc/gcc-3.4.3/configure --prefix=/usr/sfw --with-as=/usr/sfw/bin/gas --with-gnu-as --with-ld=/usr/ccs/bin/ld --without-gnu-ld --enable-languages=c,c++ --enable-shared Thread model: posix gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath) Below is a snippet of code from gcc 3.4.3 source ./libstdc++-v3/config/cpu/i386/atomicity.h: _Atomic_word __attribute__ ((__unused__)) __exchange_and_add(volatile _Atomic_word* __mem, int __val) { register _Atomic_word __result, __tmp = 1; // Obtain the atomic exchange/add spin lock. do { __asm__ __volatile__ ("xchg{l} {%0,%1|%1,%0}" : "=m" (_Atomicity_lock<0>::_S_atomicity_lock), "+r" (__tmp) : "m" (_Atomicity_lock<0>::_S_atomicity_lock)); } while (__tmp); __result = *__mem; *__mem += __val; // Release spin lock. _Atomicity_lock<0>::_S_atomicity_lock = 0; return __result; } I can not confirm that it was the i386 code included in the gcc build but it appears that way from the signature. Is this perhaps a problem with the way that gcc 3.4.3 shipping with Solaris 10 x86 was built? Should it have opted for the i486 version instead that does not use spin-locks?
Re: Thread starvation and resource saturation in atomicity functions?
"Ian Lance Taylor" <[EMAIL PROTECTED]> writes: "Chad Attermann" <[EMAIL PROTECTED]> writes: I can not confirm that it was the i386 code included in the gcc build but it appears that way from the signature. Is this perhaps a problem with the way that gcc 3.4.3 shipping with Solaris 10 x86 was built? Should it have opted for the i486 version instead that does not use spin-locks? Yes, barring the extremely unlikely case that you need to run a plain i386, you should use the i486 code from libstdc++-v3/config/cpu/i486. There are various difficulties with the i386 atomicity code. Fortunately the i486 was released almost 20 years ago, and it's generally safe to use the i486 instructions today. Ian Running at least i486 code would make sense on AMD Opteron processors. I am shocked that the gcc version shipped by Sun Microsystems would be compiled for i386. I compiled my own version of gcc 4.2.2 n the same platform and it too appears to have used i386 code. Perhaps the gcc build configuration process for Solaris is flawed? Regardless I will be attempting to build a new version today that is forced to use the i486 code. Would apprecite if you have any tips. Thanks, Chad.
Re: Thread starvation and resource saturation in atomicity functions?
"Chad Attermann" <[EMAIL PROTECTED]> writes: Running at least i486 code would make sense on AMD Opteron processors. I am shocked that the gcc version shipped by Sun Microsystems would be compiled for i386. I compiled my own version of gcc 4.2.2 n the same platform and it too appears to have used i386 code. Perhaps the gcc build configuration process for Solaris is flawed? Regardless I will be attempting to build a new version today that is forced to use the i486 code. Would apprecite if you have any tips. My bad... I was mistakenly thinking I needed re-build gcc in order to get i486 code. In reality I should only need to specify the architecture type when building my own application using "-march=i486", or perhaps even "-march=opteron" in my own case. As stated in gcc docs, i386 is the default instruction set for "i386 and x86-64 family of computers" when the architecture is not explicitly defined, so presumably atomic test-and-set operations will use spin-locks by default. So I suppose the moral of the story remains... excercise extreme caution when using varying thread priorities. Regards.