Hi!

As noticed by Richi,
FAIL: g++.dg/guality/pr55665.C   -O2  line 23 p == 40
FAIL: g++.dg/guality/pr55665.C   -O2  line 23 p == 40
FAIL: g++.dg/guality/pr55665.C   -O3 -g  line 23 p == 40
FAIL: g++.dg/guality/pr55665.C   -O3 -g  line 23 p == 40
aren't a mere wrong-debug, but actually wrong-code issues if the function
(constructor) would ever be called concurrently.

As discussed already in
https://sourceware.org/bugzilla/show_bug.cgi?id=13344
while synchronization primitives usually don't call back into the current
TU, we don't want to mark the synchronization primitives as leaf, as
otherwise for local statics that aren't TREE_ADDRESSABLE ipa-reference
might optimize them.
We have:
  _1 = __atomic_load_1 (&_ZGVZN1AC4EiE1p, 2);
  if (_1 == 0)
    goto <bb 4>; [33.00%]
  else
    goto <bb 3>; [67.00%]
  <bb 3> [local count: 956811342]:
  goto <bb 6>; [100.00%]
  <bb 4> [local count: 354334802]:
  _2 = __cxa_guard_acquire (&_ZGVZN1AC4EiE1p);
  if (_2 != 0)
    goto <bb 5>; [33.00%]
  else
    goto <bb 3>; [67.00%]
  <bb 5> [local count: 116930485]:
  _3 = bar ();
  p = _3;
  __cxa_guard_release (&_ZGVZN1AC4EiE1p);
  <bb 6> [local count: 1073741824]:
  p.2_4 = p;
  foo (p.2_4);
  _6 = p.2_4 + 1;
  p = _6;
and ipa-reference when ECF_LEAF is on __cxa_* (and probably IPA discovered
on foo) thinks it is fine to optimize by loading p from memory early at the
start of the function, then just handle it in registers and only store it
after the foo call.  It is fine in single-threaded environments, but if the
ctor can be invoked multiple times concurrently, it matters that we don't
hoist or sink the non-addressable local static loads/stores in any way
across the synchronization calls.

To quote my comment from rhbz:
"Actually, it might be a glibc problem,
http://sources.redhat.com/git/?p=glibc.git;a=commitdiff;h=aa78043a4aafe5db1a1a76d544a833b63b4c5f5c
added leaf attribute, I bet it is undesirable to use leaf attribute on any of 
the pthread.h/sem.h
synchronization primitives.  Although they don't call functions from the 
current compilation unit,
other threads might invoke functions from the current compilation unit and by 
using the synchronization
primitives we are or might be waiting on those other threads to perform some 
changes in the current translation
unit."

As discussed with Richi, ECF_LEAF is used for 3 purposes, the ipa-reference
where we want to really treat synchronization primitives as non-leaf,
for abnormal edges following calls that might perform non-local goto
(sync primitives likely won't do that, but that affects only functions with
non-local labels or similar, i.e. rare) and finally for nonfreeing_call_p
(with a few exceptions).

We might need to remove ECF_LEAF from some other builtins that can be used
for synchronization primitives (I guess some subset of sync-builtins.def,
some omp-builtins.def, but that can be done incrementally and might not need
backporting to release branches.

The following patch fixes that by dropping ECF_LEAF from those two calls.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Fixed the guality tests.

2019-10-22  Jakub Jelinek  <ja...@redhat.com>

        PR tree-optimization/85887
        * decl.c (expand_static_init): Drop ECF_LEAF from __cxa_guard_acquire
        and __cxa_guard_release.

--- gcc/cp/decl.c.jj    2019-10-22 10:13:39.038094034 +0200
+++ gcc/cp/decl.c       2019-10-22 12:17:49.558103149 +0200
@@ -8612,14 +8612,14 @@ expand_static_init (tree decl, tree init
              (acquire_name, build_function_type_list (integer_type_node,
                                                       TREE_TYPE (guard_addr),
                                                       NULL_TREE),
-              NULL_TREE, ECF_NOTHROW | ECF_LEAF);
+              NULL_TREE, ECF_NOTHROW);
          if (!release_fn || !abort_fn)
            vfntype = build_function_type_list (void_type_node,
                                                TREE_TYPE (guard_addr),
                                                NULL_TREE);
          if (!release_fn)
            release_fn = push_library_fn (release_name, vfntype, NULL_TREE,
-                                          ECF_NOTHROW | ECF_LEAF);
+                                         ECF_NOTHROW);
          if (!abort_fn)
            abort_fn = push_library_fn (abort_name, vfntype, NULL_TREE,
                                        ECF_NOTHROW | ECF_LEAF);

        Jakub

Reply via email to