On 11/29/2011 04:49 AM, Alan Modra wrote:
> Since there were a number of changes requested, and some confusion
> over what was happening in this code, it might be better if I post
> a new patch and we start over. In this patch I'm still using the MSB
> for the wait flag, and count in the low 31 bits, as that way is
> slightly more efficient on powerpc. However, if some other target
> generates really bad code for the masking operations, then I'm happy
> enough to change the comment and use
> #define SEM_WAIT 1
> #define SEM_INC 2
Agreed we can wait for more data for this. I think readability of the file
with these two names is significantly better already.
> +static inline bool
> +likely_compare_exchange (int *ptr, int *expected, int val, bool weak,
> + enum memmodel succ, enum memmodel fail)
> {
> + return __builtin_expect (__atomic_compare_exchange_n (ptr, expected, val,
> + weak, succ, fail), 1);
> }
Please move this to libgomp.h just below the memmodel definition, as a macro so
that it's not tied to int. It seems likely that we'll want this elsewhere in
libgomp as we convert things.
> +gomp_sem_wait (gomp_sem_t *sem)
> {
> + int count = *sem;
> +
> + while ((count & ~SEM_WAIT) != 0)
> + if (likely_compare_exchange (sem, &count, count - SEM_INC,
> + false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
> + return;
Tight loops like this should use the weak compare-exchange so that we don't get
two nested loops without need.
> +gomp_sem_post (gomp_sem_t *sem)
> +{
> + int count = *sem;
> +
> + while (1)
> + if (likely_compare_exchange (sem, &count, ((count + SEM_INC) &
> ~SEM_WAIT),
> + false, MEMMODEL_RELEASE, MEMMODEL_RELAXED))
> + break;
Likewise.
And this morning I finally follow the logic. I agree it's correct. I do think
we need more commentary here so that next month I can still follow it.
> +gomp_sem_wait_slow (gomp_sem_t *sem, int count)
> {
> + /* First loop spins a while. */
> + while (count == 0)
> + if (do_spin (sem, 0)
> + /* Spin timeout, nothing changed. Set waiting flag. */
> + && likely_compare_exchange (sem, &count, SEM_WAIT,
> + false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
> + {
For the record, here I agree we should use the strong CAS, because the CAS is
not the only thing in the loop. We don't want a syscall just because the
store-conditional failed.
Ok with those changes.
r~