On Mon, 2018-02-26 at 18:55 +0000, Ruslan Nikolaev via gcc wrote: > Torvald, thank you for your output. See my response below. > > On Monday, February 26, 2018 1:35 PM, Torvald Riegel <trie...@redhat.com> > wrote: > > > ... does not imply this latter statement. The statement you cited is > > about what the standard itself requires, not what makes sense for a > > particular implementation. > > True but makes sense to provide true atomics when they are available.
What do you mean by "true atomics"? For me, that includes an atomic load that is not emulated through an RMW. > Since the standard seem to allow atomic_load implementation using RMW, does > not seem to be a problem. I believe that in the C++ committee, we have consensus that the intent for lock-free atomics is that they should have an atomic load available behaves like a typical natively-supported atomic load. I can't speak for the C committee, but at least the memory models are supposed to be the same. This is a decision that implementations ultimately make, however. > In fact, lock_free flag for this type can return true only if mcx16 is > specified; otherwise -- it returns false (since it can only be determined > during runtime, assuming worst case scenario) But then -mcx16 is a different ABI effectively, and it also changes what (portable) synchronization code can expect when it sees an atomic type declared as lock-free. > > So, in such a case, using the wide CAS for > > atomic loads breaks a reasonable assumption. Moreover, it's also a > > special case, in that 32b atomics do work as intended. > > But in this case a programmer already makes an assumption that atomic_load > does not use RMW which C11 does not seem to guarantee. It makes sense for GCC as an implementation to guarantee that. > Of course, for single-width operations, the programmer may in most practical > cases assume it (even though there is no guarantee). Requiring programs to consider what is "single-width" for a particular platform, instead of just being able to test the lock-free property, decreases portability. > Anyway, there is no good solution here for double-width operations, and the > programmer should not assume it is possible when writing portable code. That's an argument in favor of splitting wide CAS out into a separate interface -- C11 atomics are portable from the perspective of the major use cases, and they should stay that way. > In fact, lock-based solution is even more confusing and potentially > error-prone (e.g., cannot be safely used inside signal handlers since it is > not lock-free, etc) > > > The behavior you favor would violate that, and > > there's no portable way to distinguish one from the other. > > There is already a similar problem with IFFUNC (when used with Linux and > glibc). In fact, I do not see any difference here. Redirection to libatomic > when mcx16 is specified just adds extra cost + less predictable behavior. > Moreover, it seems counterintuitive -- I specify a flag that mcx16 is > supported but gcc still does not use it (at least directly). It is possible > to make a change to libatomic to always use cmpxchg16b when available (even > on systems without IFFUNC), this way it is totally consistent and binary > compatible for code compiled with and without mcx16. I've commented on that elsewhere in the thread. > > I see your point in wanting to have a builtin or such for the 64b atomic > > CAS. However, IMO, this doesn't fit into the world of C11/C++11 > > atomics, and thus rather should be accessible through a separate > > interface. > Why not? If atomic_load is not really an issue, then it may be good to use > standardized interface. See above. The atomic builtins are a package that, at least on GCC's implementation, gives you a set of properties you can rely on in a portable way (in particular when used through the C11/C++11 atomic ops).