http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448
torvald at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rth at gcc dot gnu.org --- Comment #10 from torvald at gcc dot gnu.org --- (In reply to algrant from comment #6) > Here is a complete C++11 test case. The code generated for the two loads > in thread B doesn't maintain the required ordering: The C++ test has a race condition because the nonatomic load from data is not ordered wrt. the nonatomic store to data in the other thread. Please fix that and report back whether this changes anything. (I believe the use of relaxed-MO atomic accesses should be sufficient.) Nonetheless, if there is a fixed (data-race-free) similar test, then I believe this fixed test case to be valid in the sense that AFAIU the standards' requirements (both C++11's and C11's; they are equal in this aspect), code as in this test case is supposed to be carrying the dependencies forward. However, I cannot see a *practical* implementation of this requirement. AFAIU, dependencies are carried forward through loads/stores too (5.1.2.4#14): "An evaluation A carries a dependency to an evaluation B if: [...]A writes a scalar object or bit-field M, B reads from M the value written by A, and A is sequenced before B". Because a compiler cannot, in general, analyze which piece of code stored last to M, it would have to conservatively assume that a dependency is carried through M. In turn, it could never optimize expressions such as "(x-x)", at least if those affect atomic operations on M in some way, irrespective of how remotely that might have happened. (I'm restricting my statement to atomic accesses because I'm not quite sure whether there's some other reason why nonatomic accesses couldn't be affected by this.) OTOH, maybe there are factors that I'm not aware of and that effectively constrain carries-a-dependency to something that doesn't widely prohibit optimizations. I don't have significant background information on the design of this part of the memory model (nor the implementability of this part), but I'll ask around at the C++ meeting in two weeks. > According to the architecture specification, to achieve the ordering it's > sufficient to use the result of the first load in the calculation of the > address of the second, even if it's done in such a way as to have no > dependence on the value. And that's certainly a feasible *architecture* specification. ISTM that the language standard tried to model this, but then didn't give the compiler enough leeway to still be able to optimize code. The carries-a-depency rules seem like an unusual shortcut from the language level to requirements on the generated code, without the usual indirection using the observable behavior of the abstract machine and as-if. For example -- and this isn't a precise definition of course -- if the standard would require that "f" (in your test case) is needed to process the evaluation, then things like "f-f" could be optimized out, and the compiler might just do the right thing with standard optimizations. As a short-term fix, I believe we can promote all memory_order_consume atomics to memory_order_acquire, when generating code with consume-MO atomics. GCC is at fault here, not the code in the builtins or in libatomic. Thus, this would prevent errors when GCC compiled the code and a correct atomics implementation is used (e.g., third-party libatomic). This won't help when another (potentially correct) compiler generated the code with the consume-MO atomics, and GCC-compiled code fails to carry the dependencies forward. We could try to change libatomic's consume-MO implementation to prevent that, but this would any catch cases in which libatomic is actually used and not the other compiler's builtins or similar. Thus, changing libatomic does not seem to be necessary. Richard, Andrew: Thoughts?