Hi,

On 10/6/25 15:51, Tobias Burnus wrote:
Hi Josef, Martin & Honza, hello world,


Josef Melcr wrote:
this is the fourth version of this patch

I have some questions below - as a mere user.

Disclaimer: I presume that there are good reasons for the
current behavior and I know that there is always room for
improvement - and my puzzled questions shouldn't delay
the landing of the patch. (Additionally, I have no idea
how GCC's IPA internally works and I only glanced at the
patch. Still, am a bit puzzled.)

* * *

I tried the following test case:
------------------
int f()
{
  int x = 1;
  int res;
  #pragma omp parallel
    if (x > 0)
      res = 55;
    else
      res = 33;
  return res;
}
------------------

But, using gcc -fopenmp -O3 -fdump-tree-optimized, I still see
the condition in the dump/assembly.

What puzzled me a bit with the examples is that all of them use
LTO. Shouldn't the optimization not always be in place? In particular,
the generated f._omp_fn.0 is/should be static, i.e. issues like code
side growth or use outside of the TU shouldn't apply?

If I add 'int main() { return f(); }' and compile with -flto, I see
that the condition is gone.

I remember running into the exact same issue when I was first developing the testcases.  It was right around the time we discovered that ipa-cp heuristics were broken, so I just assumed it was something related to that and just stuck with LTO.  The real reason is in the function ipa_compute_jump_functions_for_bb:

-----

      /* We do not need to bother analyzing calls to unknown functions
         unless they may become known during lto/whopr.  */
      if (!callee->definition && !flag_lto
          && !gimple_call_fnspec (cs->call_stmt).known_p ())
        continue;

-----

Since the definition of GOMP_parallel isn't known, the call doesn't get analyzed and the callback edge is never created.  LTO breaks this condition, which is why enabling it yields the expected result.  If you extend the condition with something like "and the function doesn't have a callback attribute", the edge does get constructed and the if statement does get optimized away.  We could certainly add such a check, it makes sense and I don't think that people would object.
Secondly, as the function is static, I would have expected that -O2
(implies -fipa-cp) is enough; however, I see that -fipa-cp-clone
(implied by -O3) is required. - I understand that internally cloning
is easier, but I wonder whether we it could still enable this with
-O2, given that the func is static and has a single caller, only.

But also if it weren't static (or had multiple caller), for -O3 I'd
expect the cloning in this case (assuming that it is profitable by
whatever measure the compiler has).
Do you suggest cloning with -O2 or editing the function inline?  I am not sure how difficult either option would be to implement, though just enabling cloning with -O2 shouldn't be too difficult. Cloning makes sense in this case, since the original function will be cleaned up anyway, meaning there would be very little code growth, if any.  Implementing it through a new compilation flag enabled at -O2 by default (as suggested by Martin) would probably be the best.
Thanks again for the patch and all others involved in reviewing
creating, it.

Tobias
On a side note, Tobias noticed that I am missing an initializer in attr-callback.cc on line 361, which results in a warning and a failed build.  I remember fixing this in my bootstrap build, but I forgot to fix it in my dev build, from which I dumped the patch. Should I resend the patch?

Best regards,

Josef

Reply via email to