Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

Jakub Jelinek Mon, 28 Apr 2025 02:07:53 -0700

On Sun, Apr 27, 2025 at 11:56:21AM +0200, Josef Melcr wrote:
>       * builtin-attrs.def (0): New int list.


This is just weird.  Write
        * builtin-attrs.def: Add DEF_LIST_INT_INT (0,2).
?

> +DEF_CALLBACK_ATTRIBUTE(GOMP, 1, 0)
> +DEF_CALLBACK_ATTRIBUTE(GOMP, 1, 2)
> +DEF_CALLBACK_ATTRIBUTE(OACC, 2, 4)
> +DEF_CALLBACK_ATTRIBUTE(GOMP, 3, 0_2)
> +DEF_ATTR_TREE_LIST(ATTR_CALLBACK_GOMP_LIST, ATTR_CALLBACK,
> +                                                                      
> ATTR_CALLBACK_GOMP_1_2, ATTR_NOTHROW_LIST)
> +DEF_ATTR_TREE_LIST(ATTR_CALLBACK_GOMP_TASK_HELPER_LIST, ATTR_CALLBACK,
> +                                                                      
> ATTR_CALLBACK_GOMP_1_0, ATTR_NOTHROW_LIST)
> +DEF_ATTR_TREE_LIST(ATTR_CALLBACK_GOMP_TASK_LIST, ATTR_CALLBACK,
> +                                                                      
> ATTR_CALLBACK_GOMP_3_0_2, ATTR_CALLBACK_GOMP_TASK_HELPER_LIST)
> +DEF_ATTR_TREE_LIST(ATTR_CALLBACK_OACC_LIST, ATTR_CALLBACK,
> +                                                                      
> ATTR_CALLBACK_OACC_2_4, ATTR_NOTHROW_LIST)

Why so many tabs?  Can't ATTR_CALLBACK_GOMP_[123]* go below ATTR_CALLBACK_?

> @@ -465,6 +466,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
>                             handle_tm_attribute, NULL },
>    { "transaction_may_cancel_outer", 0, 0, false, true, false, false,
>                             handle_tm_attribute, NULL },
> +  { "callback", 1, -1, true, false, false, false,
> +      handle_callback_attribute, NULL},

This line should be indented like for the other attributes.

> +  /* We want to work with the parent edge whenever possible. When it comes to
> +     callback edges, a call statement might have multiple callback edges
> +     attached to it. These can be easily obtained from the parent edge 
> instead.

Dot should be followed by 2 spaces, not one (seen many times in the patch,
both in function comments and in documentation).
*/ should go on the same line as the last comment line.
If that would be after the 80 column limit, wrap line before the
last word (so "instead.  */" on the last line).

> +   */
> +  if (e && e->callback)
> +    e = e->get_callback_parent_edge ();
> +
>    if (n > 100)
>      {
>        call_site_hash = hash_table<cgraph_edge_hasher>::create_ggc (120);
> @@ -837,8 +855,31 @@ cgraph_edge::set_call_stmt (cgraph_edge *e, gcall 
> *new_stmt,
>        return e_indirect ? indirect : direct;
>      }
>  
> -  if (new_direct_callee)
> -    e = make_direct (e, new_direct_callee);
> +  /* Callback edges also need their call stmts changed.
> +     We can use the same flag as for speculative edges. */

Two spaces instead of one.

> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -1970,6 +1970,43 @@ declares that @code{my_alloc1} returns 16-byte aligned 
> pointers and
>  that @code{my_alloc2} returns a pointer whose value modulo 32 is equal
>  to 8.
>  
> +@cindex @code{callback} function attribute
> +@item callback (@var{function-pos}, @var{...})
> +The @code{callback} attribute specifies that a function takes a pointer to
> +a callback function as one of it's parameters and passes it some of it's own
> +parameters. For example:

Most importantly, I wonder if it is a good idea to use same attribute names
as clang when it behaves differently.  Yes, it seems the clang folks screwed
up the design by having it behave differently from other attributes
(nonnull, format, ...), by allowing also argument names instead of numbers,
haven't verified but from reading their docs it seems they actually count
arguments differently in methods (nonnull, format, ... count this as 1 and
user specified is 2+, while my understanding from the docs is that 1+ are
the user specified arguments and 0 is this).
Sure, for [[gnu::callback (...)]]] we could do whatever we want, but
__attribute__((callback(...))) is the problematic one.

So, I wonder if either we should implement what clang does (sure, the
argument names are a little bit awkward, we'd need special case for the
arguments of this attribute where it would parse an identifier or number,
and later on we'd remap the identifiers to numbers.
Not sure how exactly it works though, because e.g. for nonnull attribute
in both compilers one can specify the numbers not just as literals, but also
constant expressions:
constexpr int a = 2;
__attribute__((nonnull (a, a + 2))) void foo (int, void *, void *, void *);
void bar (void) { foo (0, nullptr, nullptr, nullptr); }
or use different name of the attribute.

Another question is about GOMP_task.  Can you handle any constant
propagation into that?  I see you've tried to deal with it by using 2
callback attributes but with using 0 for the argument.  Wouldn't it be
better to just special case BUILT_IN_GOMP_TASK in IPA?

Consider:
void bar (int, int *);

void
foo (int a)
{
  a = 42;
  #pragma omp task
  bar (a, 0);
}

void
baz (int n)
{
  int a[n], b;
  __builtin_memset (a, 0, sizeof (a));
  b = 42;
  #pragma omp task
  bar (b, a);
}

In the foo case, it is simple
  GOMP_task (foo._omp_fn.0, &.omp_data_o.1, 0B, 4, 4, 1, 0, 0B, 0, 0B);
call, which acts as callback (1, 2) (though note that in the GOMP_task
case the argument size and argument alignment are passed in (that is the 4,
4), so if e.g. some IPA modification would not just propagate constants,
but also change the layout of that, that would need to be modified).
In the second case it is more complex.  There is
  GOMP_task (baz._omp_fn.0, &.omp_data_o.7, baz._omp_cpyfn.1, D.3044, 8, 1, 0, 
0B, 0, 0B);
and the behavior from what IPA should care about is that there is
  void *data = __builtin_alloca_with_align (D.3044, 8);
  baz._omp_cpyfn.1 (data, &.omp_data_o.7);
  baz._omp_fn.0 (data);
The reason for this is that the memory allocated is done somewhere where it
can persist until the task can be dispatched and either there are needed
constructors or some other operations (like here actually turning a pointer
to VLA into copying the content of the VLA as data).
So, in order to constant propagate the 42 one needs to modify not just one
place, but 2.  baz has:
  b = 42;
...
  .omp_data_o.7.b = b;
and baz._omp_cpyfn.1 has:
  _6 = .omp_data_i->b;
  .omp_data_o.10_5->b = _6;
and baz._omp_fn.0 has:
  b = .omp_data_i->b;
So, best would be to change
  _6 = .omp_data_i->b;
to _6 = 42; and
  b = .omp_data_i->b;
to b = 42; as well.

Also, it would be nice if IPA could handle IFN_ASSUME calls like if they had
the callback attribute.  Internal calls obviously can't have attributes,
but otherwise it should work similarly.

> +  if (flags & ECF_CB_3_0_2)
> +    {
> +      tree args = tree_cons (
> +     NULL_TREE, build_int_cst (integer_type_node, 3),
> +     tree_cons (NULL_TREE, build_int_cst (integer_type_node, 0),
> +                build_tree_list (NULL_TREE,
> +                                 build_int_cst (integer_type_node, 2))));

Please avoid calls with ( at the end of line if possible.
      tree two = build_int_cst (integer_type_node, 2);
      tree args
        = tree_cons (NULL_TREE, build_int_cst (integer_type_node, 3),
                     tree_cons (NULL_TREE, integer_zero_node,
                                build_tree_list (NULL_TREE, two)));

        Jakub

Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

Reply via email to