Torvald, thank you for your output. See my response below. On Monday, February 26, 2018 1:35 PM, Torvald Riegel <trie...@redhat.com> wrote:
> ... does not imply this latter statement. The statement you cited is > about what the standard itself requires, not what makes sense for a > particular implementation. True but makes sense to provide true atomics when they are available. Since the standard seem to allow atomic_load implementation using RMW, does not seem to be a problem. In fact, lock_free flag for this type can return true only if mcx16 is specified; otherwise -- it returns false (since it can only be determined during runtime, assuming worst case scenario) > So, in such a case, using the wide CAS for > atomic loads breaks a reasonable assumption. Moreover, it's also a > special case, in that 32b atomics do work as intended. But in this case a programmer already makes an assumption that atomic_load does not use RMW which C11 does not seem to guarantee.Of course, for single-width operations, the programmer may in most practical cases assume it (even though there is no guarantee). Anyway, there is no good solution here for double-width operations, and the programmer should not assume it is possible when writing portable code.In fact, lock-based solution is even more confusing and potentially error-prone (e.g., cannot be safely used inside signal handlers since it is not lock-free, etc) > The behavior you favor would violate that, and > there's no portable way to distinguish one from the other. There is already a similar problem with IFFUNC (when used with Linux and glibc). In fact, I do not see any difference here. Redirection to libatomic when mcx16 is specified just adds extra cost + less predictable behavior. Moreover, it seems counterintuitive -- I specify a flag that mcx16 is supported but gcc still does not use it (at least directly). It is possible to make a change to libatomic to always use cmpxchg16b when available (even on systems without IFFUNC), this way it is totally consistent and binary compatible for code compiled with and without mcx16. > I see your point in wanting to have a builtin or such for the 64b atomic > CAS. However, IMO, this doesn't fit into the world of C11/C++11 > atomics, and thus rather should be accessible through a separate > interface. Why not? If atomic_load is not really an issue, then it may be good to use standardized interface.