On 11/27/2011 02:53 PM, Alan Modra wrote: > +enum memmodel > +{ > + MEMMODEL_RELAXED = 0, > + MEMMODEL_CONSUME = 1, > + MEMMODEL_ACQUIRE = 2, > + MEMMODEL_RELEASE = 3, > + MEMMODEL_ACQ_REL = 4, > + MEMMODEL_SEQ_CST = 5, > + MEMMODEL_LAST = 6 > +};
This should probably go to libgomp.h. > /* This is a Linux specific implementation of a semaphore synchronization > mechanism for libgomp. This type is private to the library. This > + counting semaphore implementation uses atomic instructions and the > + futex syscall, and a single 32-bit int to store semaphore state. > + The low 31 bits are the count, the top bit is a flag set when some > + threads may be waiting. */ I think we'll get better code generation on a number of targets if we make the low bit the waiting bit, and the higher bits the count. This we do (count & 1) and (count + 2) instead of larger constants needing to be generated. Not changed below... > +static inline void > +gomp_sem_wait (gomp_sem_t *sem) > { > + int count = *sem; > + > + while ((count & 0x7fffffff) != 0) > + { > + int oldval = count; > + __atomic_compare_exchange_4 (sem, &oldval, count - 1, > + false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED); > + if (__builtin_expect (oldval == count, 1)) > + return; > + count = oldval; > + } I'd really prefer not to hard-code any sizes at all. Let's change the explicit _4 to _n everywhere. The loop above doesn't actually make sense to me. If the compare-and-swap succeeds, then oldval == count - 1. Which doesn't seem to be what you're testing at all. Perhaps this entire function is better written as { int count = *sem; do { if (__builtin_expect (count & 0x80000000u, 0) { gomp_sem_wait_slow (sem, count); return; } } while (!__atomic_compare_exchange_n (sem, &count, count - 1, true, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED)); } > +gomp_sem_post (gomp_sem_t *sem) > { > + int count = *sem; > + > + while (1) > + { > + int oldval = count; > + __atomic_compare_exchange_4 (sem, &oldval, (count + 1) & 0x7fffffff, > + false, MEMMODEL_RELEASE, MEMMODEL_RELAXED); > + if (__builtin_expect (oldval == count, 1)) > + break; Again, this equality doesn't make sense. Further, you aren't testing for the high bit set inside the loop, nor are you preserving the high bit. Perhaps better written as { int count = *sem; do { if (__builtin_expect (count & 0x80000000u, 0)) { gomp_sem_post_slow (sem, count); return; } /* We can't post more than 2**31-1 times. */ assert (count < 0x7fffffff); } while (!__atomic_compare_exchange_n (sem, &count, count + 1, true, MEMMODEL_RELEASE, MEMMODEL_RELAXED)); } ... All that said, I don't see how we can ever clear the wait bit once its set? Are we better off splitting the 32-bit value into two 16-bit fields for value+waiters? Or similarly splitting a 64-bit value? That way we can at least update both fields atomically, and we ought never lose a waiter. r~