On 05/26/2017 04:58 AM, Alexander Monakov wrote:
> On Wed, 17 May 2017, Alexander Monakov wrote:
>
>> Ping.
>
> Ping^2?
>
>> (to be clear, patch 2/2 is my previous followup in this thread, I forgot to
>> adjust the subject line; it should have said:
>> "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load").
>>
>> On Wed, 10 May 2017, Alexander Monakov wrote:
>>
>>> Hi,
>>>
>>> When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't
>>> emit
>>> any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence').
>>> This
>>> is incorrect: although no machine barrier is needed, the compiler still must
>>> emit a compiler barrier into the IR to prevent propagation and code motion
>>> across the fence. The testcase added with the patch shows how it can lead
>>> to a miscompilation.
>>>
>>> The proposed patch fixes it by handling non-seq-cst fences exactly like
>>> __atomic_signal_fence is expanded, by emitting asm volatile("":::"memory").
>>>
>>> The s390 backend uses the a similar mem_thread_fence expansion, so the patch
>>> fixes both backends in the same manner.
>>>
>>> Bootstrapped and regtested on x86_64; also checked that s390-linux cc1
>>> successfully builds after the change. OK for trunk?
>>>
>>> (the original source code in the PR was misusing atomic fences by doing
>>> something like
>>>
>>> void f(int *p)
>>> {
>>> while (*p)
>>> __atomic_thread_fence(__ATOMIC_ACQUIRE);
>>> }
>>>
>>> but since *p is not atomic, a concurrent write to *p would cause a data
>>> race and
>>> thus invoke undefined behavior; also, if *p is false prior to entering the
>>> loop,
>>> execution does not encounter the fence; new test here has code usable
>>> without UB)
>>>
>>> Alexander
>>>
>>> * config/i386/sync.md (mem_thread_fence): Emit a compiler barrier for
>>> non-seq-cst fences. Adjust comment.
>>> * config/s390/s390.md (mem_thread_fence): Likewise.
>>> * optabs.c (expand_asm_memory_barrier): Export.
>>> * optabs.h (expand_asm_memory_barrier): Declare.
>>> testsuite/
>>> * gcc.target/i386/pr80640-1.c: New testcase.
So I think this is up to the target maintainers. I have no concerns
with enabling use of expand_asm_memory_barrier to be used outside of
optabs. So if the s390/x86 maintainers want to go forward, the optabs
changes are pre-approved.
The reasoning seems sound to me -- we may not need fencing at the
hardware level because it gives a certain set of guarantees, but we may
still need them at the compiler level to prevent undesirable code
motions across the fence point in the compiler.
Jeff