Torvald, thank you for your output. See my response below. 

    On Monday, February 26, 2018 1:35 PM, Torvald Riegel <trie...@redhat.com> 
wrote:

> ... does not imply this latter statement.  The statement you cited is
> about what the standard itself requires, not what makes sense for a
> particular implementation. 

True but makes sense to provide true atomics when they are available. Since the 
standard seem to allow atomic_load implementation using RMW, does not seem to 
be a problem.
In fact, lock_free flag for this type can return true only if mcx16 is 
specified; otherwise -- it returns false (since it can only be determined 
during runtime, assuming worst case scenario)

> So, in such a case, using the wide CAS for
> atomic loads breaks a reasonable assumption.  Moreover, it's also a
> special case, in that 32b atomics do work as intended.

But in this case a programmer already makes an assumption that atomic_load does 
not use RMW which C11 does not seem to guarantee.Of course, for single-width 
operations, the programmer may in most practical cases assume it (even though 
there is no guarantee).
Anyway, there is no good solution here for double-width operations, and the 
programmer should not assume it is possible when writing portable code.In fact, 
lock-based solution is even more confusing and potentially error-prone (e.g., 
cannot be safely used inside signal handlers since it is not lock-free, etc)

> The behavior you favor would violate that, and
> there's no portable way to distinguish one from the other. 

There is already a similar problem with IFFUNC (when used with Linux and 
glibc). In fact, I do not see any difference here. Redirection to libatomic 
when mcx16 is specified just adds extra cost + less predictable behavior. 
Moreover, it seems counterintuitive -- I specify a flag that mcx16 is supported 
but gcc still does not use it (at least directly). It is possible to make a 
change to libatomic to always use cmpxchg16b when available (even on systems 
without IFFUNC), this way it is totally consistent and binary compatible for 
code compiled with and without mcx16.


> I see your point in wanting to have a builtin or such for the 64b atomic
> CAS.  However, IMO, this doesn't fit into the world of C11/C++11
> atomics, and thus rather should be accessible through a separate
> interface.
Why not? If atomic_load is not really an issue, then it may be good to use 
standardized interface.




   

Reply via email to