> -----Original Message-----
> From: Aldy Hernandez [mailto:[email protected]]
> Sent: Thursday, April 12, 2012 3:12 PM
> To: Richard Guenther
> Cc: Andrew MacLeod; Boehm, Hans; gcc-patches; Torvald Riegel
> Subject: [PR tree-optimization/52558]: RFC: questions on store data
> race
>
> Here we have a testcase that affects both the C++ memory model and
> transactional memory.
>
> [Hans, this is caused by the same problem that is causing the
> speculative register promotion issue you and Torvald pointed me at].
>
> In the following testcase (adapted from the PR), the loop invariant
> motion pass caches a pre-existing value for g_2, and then performs a
> store to g_2 on every path, causing a store data race:
>
> int g_1 = 1;
> int g_2 = 0;
>
> int func_1(void)
> {
> int l;
> for (l = 0; l < 1234; l++)
> {
> if (g_1)
> return l;
> else
> g_2 = 0; <-- Store to g_2 should only happen if !g_1
> }
> return 999;
> }
>
> This gets transformed into something like:
>
> g_2_lsm = g_2;
> if (g_1) {
> g_2 = g_2_lsm; // boo! hiss!
> return 0;
> } else {
> g_2_lsm = 0;
> g_2 = g_2_lsm;
> }
>
> The spurious write to g_2 could cause a data race.
Presumably the g_2_lsm = g_2 is actually outside the loop?
Why does the second g_2 = g_2_lsm; get introduced? I would have expected it
before the return. Was the example just over-abbreviated?
Other than that, this sounds right to me. So does Richard's flag-based
version, which is the approach I would have originally expected. But that
clearly costs you a register. It would be interesting to see how they compare.
Hans
>
> Andrew has suggested verifying that the store to g_2 was actually
> different than on entry, and letting PHI copy propagation optimize the
> redundant comparisons. Like this:
>
> g_2_lsm = g_2;
> if (g_1) {
> if (g_2_lsm != g_2) // hopefully optimized away
> g_2 = g_2_lsm; // hopefully optimized away
> return 0;
> } else {
> g_2_lsm = 0;
> if (g_2_lsm != g_2) // hopefully optimized away
> g_2 = g_2_lsm;
> }
>
> ...which PHI copy propagation and/or friends should be able to
> optimize.
>
> For that matter, regardless of the memory model, the proposed solution
> would improve the existing code by removing the extra store (in this
> contrived test case anyhow).
>
> Anyone see a problem with this approach (see attached patch)?
>
> Assuming the unlikely scenario that everyone likes this :), I have two
> problems that I would like answered. But feel free to ignore if the
> approach is a no go.
>
> 1. No pass can figure out the equality (or inequality) of g_2_lsm and
> g_2. In the PR, Richard mentions that both FRE/PRE can figure this
> out, but they are not run after store motion. DOM should also be able
> to, but apparently does not :(. Tips?
>
> 2. The GIMPLE_CONDs I create in this patch are currently causing
> problems with complex floats/doubles. do_jump is complaining that it
> can't compare them, so I assume it is too late (in tree-ssa-loop-im.c)
> to compare complex values since complex lowering has already happened
> (??). Is there a more canonical way of creating a GIMPLE_COND that may
> contain complex floats at this stage?
>
> Thanks folks.