On 03/12/15 09:59, Richard Biener wrote:
On Thu, 3 Dec 2015, Tom de Vries wrote:
On 03/12/15 01:10, Tom de Vries wrote:
I've managed to reproduce it. The difference between pass and fail is
whether the compiler is configured with or without accelerator.
I'll look into it.
In the configuration with accelerator, the flag node->force_output is on for
foo._omp.fn.
This causes nonlocal_p to be true in ipa_pta_execute, which causes the
optimization to fail.
The flag is decribed as:
...
/* The symbol will be assumed to be used in an invisible way (like
by an toplevel asm statement). */
...
Looks like I have to ignore the force_output flag as well in ipa_pta_execute
for this sort of node.
It rather looks like the flag shouldn't be set. The fn after all has
its address taken!(?)
The flag is set here in expand_omp_target:
...
12682 /* Prevent IPA from removing child_fn as unreachable,
since there are no
12683 refs from the parent function to child_fn in offload
LTO mode. */
12684 if (ENABLE_OFFLOADING)
12685 cgraph_node::get (child_fn)->mark_force_output ();
...
I guess setting forced_by_abi instead would also mean child_fn is not
removed as unreachable, while still allowing optimizations:
...
/* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
to be exported. Unlike FORCE_OUTPUT this flag gets cleared to
symbols promoted to static and it does not inhibit
optimization. */
unsigned forced_by_abi : 1;
...
But I suspect that other optimizations (than ipa-pta) might break things.
Essentially we have two situations:
- in the host compiler, there is no need for the forced_output flag,
and it inhibits optimization
- in the accelerator compiler, it (or some equivalent) is needed
I wonder if setting the force_output flag only when streaming the
bytecode for offloading would work. That way, it wouldn't be set in the
host compiler, while being set in the accelerator compiler.
Thanks,
- Tom