On 10/19/2011 11:16 AM, Jakub Jelinek wrote:
On Wed, Oct 19, 2011 at 08:05:26AM -0700, Richard Henderson wrote:
In one of Andrew's previous ideas he split _ace into two separate builtins:
T __atomic_compare_exchange (T* mem, T oldval, T newval, ...);
bool __atomic_compare_exchange_success (T);
where the input to __ace_success *must* be the ssa_name output of _ace.
We then use magic to wire up both results to the same rtl pattern.
We can still probably do something exactly like that, but Andrew is
concerned about getting the generic user-level interface correct.
Until we support more than one lhs stmts in GIMPLE, there is always
an option to return a _Complex or vector or struct from the builtin,
where one part of the return value would contain the bool whether
it has succeeded and the other part would contain the old value.
That of course doesn't need to be the externally exposed builtin,
it could be an internal builtin (perhaps with space in the name or
thats a possible 3rd option. create a custom struct for the return value.
Accessing the values in the struct are still going to be memory accesses
until RTL expansion time aren't they?
user writes:
atomic_compare_exchange (&mem, &expected, desired, model))
gets turned into something like
typedef struct { int val; bool success; } __atomic_CAS_result;
{
__atomic_CAS_result __res;
tmp = *expected;
__res = __atomic_CAS (&mem, tmp, desired, model)
if (!res.success)
*expected = res.val;
return res.success;
}
since __res is a struct, that will have to be allocated on the stack and
then see if the RTL optimizations can remove it, right? The user
variable would be absolved off all the address taken crud though.
something). Or perhaps the user builtin
if (__atomic_compare_exchange (&mem,&expected, newval))
could be lowered into
tmp1 = expected;
tmp2 = __atomic val_compare_exchange (&mem, tmp1, newval);
expected = tmp2;
if (tmp1 == tmp2)
if it is known that it can be handled by the backend.
Thats what we generate now if we fall back to the __sync
implementations, but that only works for strong CAS. Of course, we're
only falling back to that at RTL time rather than the tree
optimizers... we could fudge around earlier perhaps for the strong
version...
Andrew