Hi! As noticed by Richi, FAIL: g++.dg/guality/pr55665.C -O2 line 23 p == 40 FAIL: g++.dg/guality/pr55665.C -O2 line 23 p == 40 FAIL: g++.dg/guality/pr55665.C -O3 -g line 23 p == 40 FAIL: g++.dg/guality/pr55665.C -O3 -g line 23 p == 40 aren't a mere wrong-debug, but actually wrong-code issues if the function (constructor) would ever be called concurrently.
As discussed already in https://sourceware.org/bugzilla/show_bug.cgi?id=13344 while synchronization primitives usually don't call back into the current TU, we don't want to mark the synchronization primitives as leaf, as otherwise for local statics that aren't TREE_ADDRESSABLE ipa-reference might optimize them. We have: _1 = __atomic_load_1 (&_ZGVZN1AC4EiE1p, 2); if (_1 == 0) goto <bb 4>; [33.00%] else goto <bb 3>; [67.00%] <bb 3> [local count: 956811342]: goto <bb 6>; [100.00%] <bb 4> [local count: 354334802]: _2 = __cxa_guard_acquire (&_ZGVZN1AC4EiE1p); if (_2 != 0) goto <bb 5>; [33.00%] else goto <bb 3>; [67.00%] <bb 5> [local count: 116930485]: _3 = bar (); p = _3; __cxa_guard_release (&_ZGVZN1AC4EiE1p); <bb 6> [local count: 1073741824]: p.2_4 = p; foo (p.2_4); _6 = p.2_4 + 1; p = _6; and ipa-reference when ECF_LEAF is on __cxa_* (and probably IPA discovered on foo) thinks it is fine to optimize by loading p from memory early at the start of the function, then just handle it in registers and only store it after the foo call. It is fine in single-threaded environments, but if the ctor can be invoked multiple times concurrently, it matters that we don't hoist or sink the non-addressable local static loads/stores in any way across the synchronization calls. To quote my comment from rhbz: "Actually, it might be a glibc problem, http://sources.redhat.com/git/?p=glibc.git;a=commitdiff;h=aa78043a4aafe5db1a1a76d544a833b63b4c5f5c added leaf attribute, I bet it is undesirable to use leaf attribute on any of the pthread.h/sem.h synchronization primitives. Although they don't call functions from the current compilation unit, other threads might invoke functions from the current compilation unit and by using the synchronization primitives we are or might be waiting on those other threads to perform some changes in the current translation unit." As discussed with Richi, ECF_LEAF is used for 3 purposes, the ipa-reference where we want to really treat synchronization primitives as non-leaf, for abnormal edges following calls that might perform non-local goto (sync primitives likely won't do that, but that affects only functions with non-local labels or similar, i.e. rare) and finally for nonfreeing_call_p (with a few exceptions). We might need to remove ECF_LEAF from some other builtins that can be used for synchronization primitives (I guess some subset of sync-builtins.def, some omp-builtins.def, but that can be done incrementally and might not need backporting to release branches. The following patch fixes that by dropping ECF_LEAF from those two calls. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Fixed the guality tests. 2019-10-22 Jakub Jelinek <ja...@redhat.com> PR tree-optimization/85887 * decl.c (expand_static_init): Drop ECF_LEAF from __cxa_guard_acquire and __cxa_guard_release. --- gcc/cp/decl.c.jj 2019-10-22 10:13:39.038094034 +0200 +++ gcc/cp/decl.c 2019-10-22 12:17:49.558103149 +0200 @@ -8612,14 +8612,14 @@ expand_static_init (tree decl, tree init (acquire_name, build_function_type_list (integer_type_node, TREE_TYPE (guard_addr), NULL_TREE), - NULL_TREE, ECF_NOTHROW | ECF_LEAF); + NULL_TREE, ECF_NOTHROW); if (!release_fn || !abort_fn) vfntype = build_function_type_list (void_type_node, TREE_TYPE (guard_addr), NULL_TREE); if (!release_fn) release_fn = push_library_fn (release_name, vfntype, NULL_TREE, - ECF_NOTHROW | ECF_LEAF); + ECF_NOTHROW); if (!abort_fn) abort_fn = push_library_fn (abort_name, vfntype, NULL_TREE, ECF_NOTHROW | ECF_LEAF); Jakub