> -----Original Message----- > From: Aldy Hernandez [mailto:al...@redhat.com] > Sent: Thursday, April 12, 2012 3:12 PM > To: Richard Guenther > Cc: Andrew MacLeod; Boehm, Hans; gcc-patches; Torvald Riegel > Subject: [PR tree-optimization/52558]: RFC: questions on store data > race > > Here we have a testcase that affects both the C++ memory model and > transactional memory. > > [Hans, this is caused by the same problem that is causing the > speculative register promotion issue you and Torvald pointed me at]. > > In the following testcase (adapted from the PR), the loop invariant > motion pass caches a pre-existing value for g_2, and then performs a > store to g_2 on every path, causing a store data race: > > int g_1 = 1; > int g_2 = 0; > > int func_1(void) > { > int l; > for (l = 0; l < 1234; l++) > { > if (g_1) > return l; > else > g_2 = 0; <-- Store to g_2 should only happen if !g_1 > } > return 999; > } > > This gets transformed into something like: > > g_2_lsm = g_2; > if (g_1) { > g_2 = g_2_lsm; // boo! hiss! > return 0; > } else { > g_2_lsm = 0; > g_2 = g_2_lsm; > } > > The spurious write to g_2 could cause a data race. Presumably the g_2_lsm = g_2 is actually outside the loop?
Why does the second g_2 = g_2_lsm; get introduced? I would have expected it before the return. Was the example just over-abbreviated? Other than that, this sounds right to me. So does Richard's flag-based version, which is the approach I would have originally expected. But that clearly costs you a register. It would be interesting to see how they compare. Hans > > Andrew has suggested verifying that the store to g_2 was actually > different than on entry, and letting PHI copy propagation optimize the > redundant comparisons. Like this: > > g_2_lsm = g_2; > if (g_1) { > if (g_2_lsm != g_2) // hopefully optimized away > g_2 = g_2_lsm; // hopefully optimized away > return 0; > } else { > g_2_lsm = 0; > if (g_2_lsm != g_2) // hopefully optimized away > g_2 = g_2_lsm; > } > > ...which PHI copy propagation and/or friends should be able to > optimize. > > For that matter, regardless of the memory model, the proposed solution > would improve the existing code by removing the extra store (in this > contrived test case anyhow). > > Anyone see a problem with this approach (see attached patch)? > > Assuming the unlikely scenario that everyone likes this :), I have two > problems that I would like answered. But feel free to ignore if the > approach is a no go. > > 1. No pass can figure out the equality (or inequality) of g_2_lsm and > g_2. In the PR, Richard mentions that both FRE/PRE can figure this > out, but they are not run after store motion. DOM should also be able > to, but apparently does not :(. Tips? > > 2. The GIMPLE_CONDs I create in this patch are currently causing > problems with complex floats/doubles. do_jump is complaining that it > can't compare them, so I assume it is too late (in tree-ssa-loop-im.c) > to compare complex values since complex lowering has already happened > (??). Is there a more canonical way of creating a GIMPLE_COND that may > contain complex floats at this stage? > > Thanks folks.