[Bug middle-end/68616] miscompilation in multi-threaded code

torvald at gcc dot gnu.org Tue, 01 Dec 2015 07:12:12 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68616


--- Comment #2 from torvald at gcc dot gnu.org ---
I basically don't know anything about IPA, so I'll just assume that by
"barrier" you mean conditions prohibiting transformations.  I'm also not sure
whether just CSE is a problem here, so I'll try to give an unspecific but broad
answer.

I'll assume we're looking at cases of nonatomic accesses to foo (such as ptr in
the test case) and calls to the (new) atomic builtins on different variables
bar1,bar2,... as synchronization points.  Mixed atomic/nonatomic accesses to
the same memory location are in most cases a bug, and I believe we discourage
it, but they might not generally be incorrect (e.g., if loading through an
atomic op and observing a value that indicates we're the only thread accessing
it, subsequent nonatomic accesses might be fine); maybe we should just let an
atomic access to bar1 be a barrier for all movement/merging across/involving
this access.  Any old atomic builtins (ie, __synch) can probably be handled
like the equivalent calls to the new builtins.

I believe we don't need to do includes volatiles, because even if they are used
in old-style code, they should have asm compiler barriers around them -- and I
hope we're handling those correctly.

Because &foo != &bar, atomic stores to bar must be __ATOMIC_RELEASE or
stronger, and atomic loads to bar must be __ATOMIC_ACQUIRE or stronger;
otherwise, there's no implied ordering relationship between the foo and bar
accesses.

A good way to find out what transformations are or are not allowed is to
consider the data-race-freedom (DRF) requirement and which regions an
implementation would be allowed to execute atomically.

For example, "foo = 1; bar.store(1, release); foo = 2;": The implementation is
allowed to execute that in one atomic step, so there cannot be a nonatomic read
of fooin another thread that's triggered by the store to bar because there
would be a data race in this case.  So, foo=1 can be removed if we assume DRF.
(Note that old synchronization code may not be written to guarantee DRF; so
perhaps this should be conditional on -std=c11 or such.)

"temp=foo; if (bar.load(acquire)) temp2=foo;" is similar.

However, this example here (basically the test case) doesn't allow for
optimizing across the synchronization:
"foo = 1; temp=foo; bar.store(1, release); if (bar.load(acquire)==2)
temp2=foo;"

This is because both loads of foo cannot be executed in one atomic step, as it
needs another thread to increment bar to the value 2; also, there is no
nonatomic access to foo between the two atomic accesses, so we cannot derive
that those other threads that execute between the atomics don't access foo
because of DRF.

I hope this helps, though I'm aware it's pretty general.  If you have a list of
specific transformations, I could look through that and say which ones are
correct or not.  Perhaps it's okay for now if we assume that all non-relaxed
atomics are transformation barriers, as we do elsewhere AFAIK.

[Bug middle-end/68616] miscompilation in multi-threaded code

Reply via email to