The following patch fixes an issue in IPA PTA regarding to handling of DECL_BY_REFERENCE function results at the caller side. The issue for the testcase in the PR is that we use the wrong function decl to look for DECL_RESULT for calls that are an alias (which get DECL_RESULT released).
But the issue is deeper in that the code also does not handle indirect calls correctly - to expose a testcase for this the patch also enables optimistic handling of functions escaping via their addresses, this is already handled fine after I added code to parse global initializers correctly. LTO bootstrapped and tested on x86_64-unknown-linux-gnu with IPA PTA enabled, inspected PTA result on the PRs testcase (I failed to create a small reproducer). Bootstrap / regtest running on x86_64-unknown-linux-gnu. This is the trunk version of the fix, for the branch where the issue was reported against I will refrain from handling address-taken functions differently. I hope I deciphered enough of the calls handling to assess that aggregate_value_p always matches DECL_BY_REFERENCE on DECL_RESULT. IPA PTA needs to know the GIMPLE representation of the callees DECL_RESULT (whether it's a pointer - at the caller side we still see the non-reference LHS). And that needs to work for indirect calls as well. Richard. 2016-04-27 Richard Biener <rguent...@suse.de> PR ipa/70760 * tree-ssa-structalias.c (find_func_aliases_for_call): Use aggregate_value_p to determine if a function result is returned by reference. (ipa_pta_execute): Functions having their address taken are not automatically nonlocal. * g++.dg/ipa/ipa-pta-2.C: New testcase. * gcc.dg/ipa/ipa-pta-1.c: Adjust. Index: gcc/tree-ssa-structalias.c =================================================================== *** gcc/tree-ssa-structalias.c (revision 235478) --- gcc/tree-ssa-structalias.c (working copy) *************** find_func_aliases_for_call (struct funct *** 4641,4652 **** auto_vec<ce_s, 2> lhsc; struct constraint_expr rhs; struct constraint_expr *lhsp; get_constraint_for (lhsop, &lhsc); rhs = get_function_part_constraint (fi, fi_result); ! if (fndecl ! && DECL_RESULT (fndecl) ! && DECL_BY_REFERENCE (DECL_RESULT (fndecl))) { auto_vec<ce_s, 2> tem; tem.quick_push (rhs); --- 4737,4747 ---- auto_vec<ce_s, 2> lhsc; struct constraint_expr rhs; struct constraint_expr *lhsp; + bool aggr_p = aggregate_value_p (lhsop, gimple_call_fntype (t)); get_constraint_for (lhsop, &lhsc); rhs = get_function_part_constraint (fi, fi_result); ! if (aggr_p) { auto_vec<ce_s, 2> tem; tem.quick_push (rhs); *************** find_func_aliases_for_call (struct funct *** 4656,4677 **** } FOR_EACH_VEC_ELT (lhsc, j, lhsp) process_constraint (new_constraint (*lhsp, rhs)); - } ! /* If we pass the result decl by reference, honor that. */ ! if (lhsop ! && fndecl ! && DECL_RESULT (fndecl) ! && DECL_BY_REFERENCE (DECL_RESULT (fndecl))) ! { ! struct constraint_expr lhs; ! struct constraint_expr *rhsp; ! get_constraint_for_address_of (lhsop, &rhsc); ! lhs = get_function_part_constraint (fi, fi_result); ! FOR_EACH_VEC_ELT (rhsc, j, rhsp) ! process_constraint (new_constraint (lhs, *rhsp)); ! rhsc.truncate (0); } /* If we use a static chain, pass it along. */ --- 4751,4769 ---- } FOR_EACH_VEC_ELT (lhsc, j, lhsp) process_constraint (new_constraint (*lhsp, rhs)); ! /* If we pass the result decl by reference, honor that. */ ! if (aggr_p) ! { ! struct constraint_expr lhs; ! struct constraint_expr *rhsp; ! get_constraint_for_address_of (lhsop, &rhsc); ! lhs = get_function_part_constraint (fi, fi_result); ! FOR_EACH_VEC_ELT (rhsc, j, rhsp) ! process_constraint (new_constraint (lhs, *rhsp)); ! rhsc.truncate (0); ! } } /* If we use a static chain, pass it along. */ *************** ipa_pta_execute (void) *** 7686,7715 **** gcc_assert (!node->clone_of); - /* When parallelizing a code region, we split the region off into a - separate function, to be run by several threads in parallel. So for a - function foo, we split off a region into a function - foo._0 (void *foodata), and replace the region with some variant of a - function call run_on_threads (&foo._0, data). The '&foo._0' sets the - address_taken bit for function foo._0, which would make it non-local. - But for the purpose of ipa-pta, we can regard the run_on_threads call - as a local call foo._0 (data), so we ignore address_taken on nodes - with parallelized_function set. - Note: this is only safe, if foo and foo._0 are in the same lto - partition. */ - bool node_address_taken = ((node->parallelized_function - && !node->used_from_other_partition) - ? false - : node->address_taken); - /* For externally visible or attribute used annotated functions use local constraints for their arguments. For local functions we see all callers and thus do not need initial constraints for parameters. */ bool nonlocal_p = (node->used_from_other_partition || node->externally_visible ! || node->force_output ! || node_address_taken); node->call_for_symbol_thunks_and_aliases (refered_from_nonlocal_fn, &nonlocal_p, true); --- 7778,7790 ---- gcc_assert (!node->clone_of); /* For externally visible or attribute used annotated functions use local constraints for their arguments. For local functions we see all callers and thus do not need initial constraints for parameters. */ bool nonlocal_p = (node->used_from_other_partition || node->externally_visible ! || node->force_output); node->call_for_symbol_thunks_and_aliases (refered_from_nonlocal_fn, &nonlocal_p, true); Index: gcc/testsuite/g++.dg/ipa/ipa-pta-2.C =================================================================== *** gcc/testsuite/g++.dg/ipa/ipa-pta-2.C (revision 0) --- gcc/testsuite/g++.dg/ipa/ipa-pta-2.C (working copy) *************** *** 0 **** --- 1,37 ---- + // { dg-do run } + // { dg-options "-O2 -fipa-pta" } + + extern "C" void abort (void); + + struct Y { ~Y(); int i; }; + + Y::~Y () {} + + static Y __attribute__((noinline)) foo () + { + Y res; + res.i = 3; + return res; + } + + static Y __attribute__((noinline)) bar () + { + Y res; + res.i = 42; + return res; + } + + static Y (*fn) (); + + int a; + int main() + { + if (a) + fn = foo; + else + fn = bar; + Y res = fn (); + if (res.i != 42) + abort (); + return 0; + } Index: gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c =================================================================== *** gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c (revision 235478) --- gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c (working copy) *************** int main() *** 40,52 **** } /* IPA PTA needs to handle indirect calls properly. Verify that ! both bar and foo get a (and only a) in their arguments points-to sets. ! ??? As bar and foo have their address taken there might be callers ! not seen by IPA PTA (if the address escapes the unit which we only compute ! during IPA PTA...). Thus the solution also includes NONLOCAL. */ /* { dg-final { scan-ipa-dump "fn_1 = { bar foo }" "pta2" } } */ ! /* { dg-final { scan-ipa-dump "bar.arg0 = { NONLOCAL a }" "pta2" } } */ ! /* { dg-final { scan-ipa-dump "bar.arg1 = { NONLOCAL a }" "pta2" } } */ ! /* { dg-final { scan-ipa-dump "foo.arg0 = { NONLOCAL a }" "pta2" } } */ ! /* { dg-final { scan-ipa-dump "foo.arg1 = { NONLOCAL a }" "pta2" } } */ --- 40,49 ---- } /* IPA PTA needs to handle indirect calls properly. Verify that ! both bar and foo get a (and only a) in their arguments points-to sets. */ /* { dg-final { scan-ipa-dump "fn_1 = { bar foo }" "pta2" } } */ ! /* { dg-final { scan-ipa-dump "bar.arg0 = { a }" "pta2" } } */ ! /* { dg-final { scan-ipa-dump "bar.arg1 = { a }" "pta2" } } */ ! /* { dg-final { scan-ipa-dump "foo.arg0 = { a }" "pta2" } } */ ! /* { dg-final { scan-ipa-dump "foo.arg1 = { a }" "pta2" } } */