On Thu, 28 Apr 2016, Thomas Schwinge wrote: > Hi! > > Richard's r235511 changes (quoted below) cause certain nvptx offloading > test cases to run into SIGSEGVs: > > [...] > #4 0x0000000000d14193 in nvptx_libcall_value (mode=mode@entry=SImode) > at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489 > #5 0x0000000000d17a20 in nvptx_function_value (type=0x7fc1fa359690, > func=0x0, outgoing=<optimized out>) > at [...]/source-gcc/gcc/config/nvptx/nvptx.c:512 > #6 0x00000000006ba220 in hard_function_value > (valtype=valtype@entry=0x7fc1fa359690, func=func@entry=0x0, > fntype=fntype@entry=0x0, > outgoing=outgoing@entry=0) at [...]/source-gcc/gcc/explow.c:1860 > #7 0x000000000073b0fa in aggregate_value_p > (exp=exp@entry=0x7fc1fa41a048, fntype=0x0) > at [...]/source-gcc/gcc/function.c:2086 > #8 0x0000000000bebc11 in find_func_aliases_for_call (t=0x1feac90, > fn=0x7ffe448ca8a0) > at [...]/source-gcc/gcc/tree-ssa-structalias.c:4644 > #9 find_func_aliases (fn=fn@entry=0x7fc1fa43a540, > origt=origt@entry=0x7fc1fa43a7e0) > at [...]/source-gcc/gcc/tree-ssa-structalias.c:4737 > #10 0x0000000000bf04eb in ipa_pta_execute () > at [...]/source-gcc/gcc/tree-ssa-structalias.c:7787 > #11 (anonymous namespace)::pass_ipa_pta::execute (this=<optimized out>) > at [...]/source-gcc/gcc/tree-ssa-structalias.c:8035 > #12 0x0000000000940bed in execute_one_pass (pass=pass@entry=0x1f43770) > at [...]/source-gcc/gcc/passes.c:2348 > #13 0x0000000000941972 in execute_ipa_pass_list (pass=0x1f43770) > at [...]/source-gcc/gcc/passes.c:2778 > #14 0x0000000000607f1f in symbol_table::compile (this=0x7fc1fa359000) > at [...]/source-gcc/gcc/cgraphunit.c:2435 > #15 0x000000000056ad48 in lto_main () at > [...]/source-gcc/gcc/lto/lto.c:3328 > #16 0x0000000000a065df in compile_file () at > [...]/source-gcc/gcc/toplev.c:474 > #17 0x000000000053753a in do_compile () at > [...]/source-gcc/gcc/toplev.c:1998 > #18 toplev::main (this=this@entry=0x7ffe448caba0, argc=argc@entry=18, > argv=0x1f1eec0, argv@entry=0x7ffe448caca8) > at [...]/source-gcc/gcc/toplev.c:2106 > #19 0x00000000005391d7 in main (argc=18, argv=0x7ffe448caca8) > at [...]/source-gcc/gcc/main.c:39 > > The immediate problem is that > gcc/config/nvptx/nvptx.c:nvptx_libcall_value is called in a context where > cfun is NULL, and it fails to handle that appropriately: > > (gdb) frame 4 > #4 0x0000000000d14193 in nvptx_libcall_value (mode=mode@entry=SImode) > at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489 > 489 if (!cfun->machine->doing_call) > (gdb) print cfun > $1 = (function *) 0x0 > > Looking at the backtrace, I see that in frame 7, > gcc/function.c:aggregate_value_p is called with a NULL fntype. This > function is evidently prepared to handle that case, likewise for > gcc/explow.c:hard_function_value. Does it thus follow that > gcc/config/nvptx/nvptx.c:nvptx_function_value and/or > gcc/config/nvptx/nvptx.c:nvptx_libcall_value need to be changed? Is > something like the following sufficient (works in offloading testing, but > feels a bit like just "treating the symptoms"); for instance, should this > case rather be handled in gcc/config/nvptx/nvptx.c:nvptx_function_value > already? > > --- gcc/config/nvptx/nvptx.c > +++ gcc/config/nvptx/nvptx.c > @@ -484,7 +484,7 @@ nvptx_strict_argument_naming (cumulative_args_t cum_v) > static rtx > nvptx_libcall_value (machine_mode mode, const_rtx) > { > - if (!cfun->machine->doing_call) > + if (!cfun || !cfun->machine->doing_call) > /* Pretend to return in a hard reg for early uses before pseudos can be > generated. */ > return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
Doing anything based on 'cfun' here is fishy at least for the call context of aggregate_value_p as that is also used when looking at the caller side of a call for example when expanding calls where cfun is then the callers cfun and not the callees. So I suggest to remove cfun->machine->doing_call and revisit the reason why it was added for PTX. Richard. > > For reference: > > On Wed, 27 Apr 2016 13:07:36 +0200 (CEST), Richard Biener <rguent...@suse.de> > wrote: > > The following patch fixes an issue in IPA PTA regarding to handling > > of DECL_BY_REFERENCE function results at the caller side. The issue > > for the testcase in the PR is that we use the wrong function decl > > to look for DECL_RESULT for calls that are an alias (which get > > DECL_RESULT released). > > > > But the issue is deeper in that the code also does not handle > > indirect calls correctly - to expose a testcase for this the > > patch also enables optimistic handling of functions escaping > > via their addresses, this is already handled fine after I added > > code to parse global initializers correctly. > > > > LTO bootstrapped and tested on x86_64-unknown-linux-gnu with IPA PTA > > enabled, inspected PTA result on the PRs testcase (I failed to create a > > small reproducer). > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > > > This is the trunk version of the fix, for the branch where the > > issue was reported against I will refrain from handling address-taken > > functions differently. > > > > I hope I deciphered enough of the calls handling to assess that > > aggregate_value_p always matches DECL_BY_REFERENCE on DECL_RESULT. > > IPA PTA needs to know the GIMPLE representation of the callees > > DECL_RESULT (whether it's a pointer - at the caller side we > > still see the non-reference LHS). And that needs to work for > > indirect calls as well. > > > > Richard. > > > > 2016-04-27 Richard Biener <rguent...@suse.de> > > > > PR ipa/70760 > > * tree-ssa-structalias.c (find_func_aliases_for_call): Use > > aggregate_value_p to determine if a function result is > > returned by reference. > > (ipa_pta_execute): Functions having their address taken are > > not automatically nonlocal. > > > > * g++.dg/ipa/ipa-pta-2.C: New testcase. > > * gcc.dg/ipa/ipa-pta-1.c: Adjust. > > > > Index: gcc/tree-ssa-structalias.c > > =================================================================== > > *** gcc/tree-ssa-structalias.c (revision 235478) > > --- gcc/tree-ssa-structalias.c (working copy) > > *************** find_func_aliases_for_call (struct funct > > *** 4641,4652 **** > > auto_vec<ce_s, 2> lhsc; > > struct constraint_expr rhs; > > struct constraint_expr *lhsp; > > > > get_constraint_for (lhsop, &lhsc); > > rhs = get_function_part_constraint (fi, fi_result); > > ! if (fndecl > > ! && DECL_RESULT (fndecl) > > ! && DECL_BY_REFERENCE (DECL_RESULT (fndecl))) > > { > > auto_vec<ce_s, 2> tem; > > tem.quick_push (rhs); > > --- 4737,4747 ---- > > auto_vec<ce_s, 2> lhsc; > > struct constraint_expr rhs; > > struct constraint_expr *lhsp; > > + bool aggr_p = aggregate_value_p (lhsop, gimple_call_fntype (t)); > > > > get_constraint_for (lhsop, &lhsc); > > rhs = get_function_part_constraint (fi, fi_result); > > ! if (aggr_p) > > { > > auto_vec<ce_s, 2> tem; > > tem.quick_push (rhs); > > *************** find_func_aliases_for_call (struct funct > > *** 4656,4677 **** > > } > > FOR_EACH_VEC_ELT (lhsc, j, lhsp) > > process_constraint (new_constraint (*lhsp, rhs)); > > - } > > > > ! /* If we pass the result decl by reference, honor that. */ > > ! if (lhsop > > ! && fndecl > > ! && DECL_RESULT (fndecl) > > ! && DECL_BY_REFERENCE (DECL_RESULT (fndecl))) > > ! { > > ! struct constraint_expr lhs; > > ! struct constraint_expr *rhsp; > > > > ! get_constraint_for_address_of (lhsop, &rhsc); > > ! lhs = get_function_part_constraint (fi, fi_result); > > ! FOR_EACH_VEC_ELT (rhsc, j, rhsp) > > ! process_constraint (new_constraint (lhs, *rhsp)); > > ! rhsc.truncate (0); > > } > > > > /* If we use a static chain, pass it along. */ > > --- 4751,4769 ---- > > } > > FOR_EACH_VEC_ELT (lhsc, j, lhsp) > > process_constraint (new_constraint (*lhsp, rhs)); > > > > ! /* If we pass the result decl by reference, honor that. */ > > ! if (aggr_p) > > ! { > > ! struct constraint_expr lhs; > > ! struct constraint_expr *rhsp; > > > > ! get_constraint_for_address_of (lhsop, &rhsc); > > ! lhs = get_function_part_constraint (fi, fi_result); > > ! FOR_EACH_VEC_ELT (rhsc, j, rhsp) > > ! process_constraint (new_constraint (lhs, *rhsp)); > > ! rhsc.truncate (0); > > ! } > > } > > > > /* If we use a static chain, pass it along. */ > > *************** ipa_pta_execute (void) > > *** 7686,7715 **** > > > > gcc_assert (!node->clone_of); > > > > - /* When parallelizing a code region, we split the region off into a > > - separate function, to be run by several threads in parallel. So for a > > - function foo, we split off a region into a function > > - foo._0 (void *foodata), and replace the region with some variant of a > > - function call run_on_threads (&foo._0, data). The '&foo._0' sets the > > - address_taken bit for function foo._0, which would make it non-local. > > - But for the purpose of ipa-pta, we can regard the run_on_threads call > > - as a local call foo._0 (data), so we ignore address_taken on nodes > > - with parallelized_function set. > > - Note: this is only safe, if foo and foo._0 are in the same lto > > - partition. */ > > - bool node_address_taken = ((node->parallelized_function > > - && !node->used_from_other_partition) > > - ? false > > - : node->address_taken); > > - > > /* For externally visible or attribute used annotated functions use > > local constraints for their arguments. > > For local functions we see all callers and thus do not need initial > > constraints for parameters. */ > > bool nonlocal_p = (node->used_from_other_partition > > || node->externally_visible > > ! || node->force_output > > ! || node_address_taken); > > node->call_for_symbol_thunks_and_aliases (refered_from_nonlocal_fn, > > &nonlocal_p, true); > > > > --- 7778,7790 ---- > > > > gcc_assert (!node->clone_of); > > > > /* For externally visible or attribute used annotated functions use > > local constraints for their arguments. > > For local functions we see all callers and thus do not need initial > > constraints for parameters. */ > > bool nonlocal_p = (node->used_from_other_partition > > || node->externally_visible > > ! || node->force_output); > > node->call_for_symbol_thunks_and_aliases (refered_from_nonlocal_fn, > > &nonlocal_p, true); > > > > Index: gcc/testsuite/g++.dg/ipa/ipa-pta-2.C > > =================================================================== > > *** gcc/testsuite/g++.dg/ipa/ipa-pta-2.C (revision 0) > > --- gcc/testsuite/g++.dg/ipa/ipa-pta-2.C (working copy) > > *************** > > *** 0 **** > > --- 1,37 ---- > > + // { dg-do run } > > + // { dg-options "-O2 -fipa-pta" } > > + > > + extern "C" void abort (void); > > + > > + struct Y { ~Y(); int i; }; > > + > > + Y::~Y () {} > > + > > + static Y __attribute__((noinline)) foo () > > + { > > + Y res; > > + res.i = 3; > > + return res; > > + } > > + > > + static Y __attribute__((noinline)) bar () > > + { > > + Y res; > > + res.i = 42; > > + return res; > > + } > > + > > + static Y (*fn) (); > > + > > + int a; > > + int main() > > + { > > + if (a) > > + fn = foo; > > + else > > + fn = bar; > > + Y res = fn (); > > + if (res.i != 42) > > + abort (); > > + return 0; > > + } > > Index: gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c > > =================================================================== > > *** gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c (revision 235478) > > --- gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c (working copy) > > *************** int main() > > *** 40,52 **** > > } > > > > /* IPA PTA needs to handle indirect calls properly. Verify that > > ! both bar and foo get a (and only a) in their arguments points-to sets. > > ! ??? As bar and foo have their address taken there might be callers > > ! not seen by IPA PTA (if the address escapes the unit which we only > > compute > > ! during IPA PTA...). Thus the solution also includes NONLOCAL. */ > > > > /* { dg-final { scan-ipa-dump "fn_1 = { bar foo }" "pta2" } } */ > > ! /* { dg-final { scan-ipa-dump "bar.arg0 = { NONLOCAL a }" "pta2" } } */ > > ! /* { dg-final { scan-ipa-dump "bar.arg1 = { NONLOCAL a }" "pta2" } } */ > > ! /* { dg-final { scan-ipa-dump "foo.arg0 = { NONLOCAL a }" "pta2" } } */ > > ! /* { dg-final { scan-ipa-dump "foo.arg1 = { NONLOCAL a }" "pta2" } } */ > > --- 40,49 ---- > > } > > > > /* IPA PTA needs to handle indirect calls properly. Verify that > > ! both bar and foo get a (and only a) in their arguments points-to sets. > > */ > > > > /* { dg-final { scan-ipa-dump "fn_1 = { bar foo }" "pta2" } } */ > > ! /* { dg-final { scan-ipa-dump "bar.arg0 = { a }" "pta2" } } */ > > ! /* { dg-final { scan-ipa-dump "bar.arg1 = { a }" "pta2" } } */ > > ! /* { dg-final { scan-ipa-dump "foo.arg0 = { a }" "pta2" } } */ > > ! /* { dg-final { scan-ipa-dump "foo.arg1 = { a }" "pta2" } } */ > > > Grüße > Thomas > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)