https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 21 Feb 2019, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609
> 
> --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Header-less version of the testcase:
> typedef __SIZE_TYPE__ size_t;
> 
> __attribute__((always_inline)) static inline void
> copy (int *restrict a, int *restrict b)
> {
>   *b = *a;
>   *a = 7;
> }
> 
> __attribute__((noinline)) void
> floppy (int *mat, size_t *idxs)
> {
>   for (int i = 0; i < 3; i++)
>     copy (&mat[i%2], &mat[idxs[i]]);
> }
> 
> int
> main ()
> {
>   int mat[2] = {10, 20};
>   size_t idxs[3] = {1, 0, 1};
>   floppy (mat, idxs);
>   if (mat[0] != 7 || mat[1] != 10)
>     __builtin_abort ();
>   return 0;
> }
> 
> Richi, any progress on this?  How should the loop unrolling determine when to
> use different base/clique and when to use the same?  I mean, isn't it 
> different
> if each loop body invokes another inlined call with restrict args vs. when the
> loop is within the same original function?

I've posted the patch as RFC back in October but got no response
(and forgot about it).

In general you always have to use different base/clique unless you
are copying stmts whose access address do not vary with the iteration.
Of course that's only required if the programmer didn't tell you
that there's no aliasing - which makes the issue only appear when
inlining happens.  A similar case would be the user writing

 for (...)
  {
    int * restrict x = &p[i];
    int * restruct y = &p[i+1];
    *x = *y;
  }

but we do not support this "local" generation of restrict qualified
pointers (so you have to jump through inlines to get the same effect).

So eventually we could have a flag in struct function whether
the function had a function with restrict tags inlined into it
and only then perform this copying...  or somehow remember all
"inlined cliques" and only ever remap those (we could divide
the namespace for this - PTA only ever uses a single clique
which we could hard-code to 1, but then PTA runs multiple times
so it might get a little more complicated).

Unfortunately I have no benchmarks or real-world code using
restrict so I cannot really assess the impact of the
boiler-plate remapping in copy_bbs.

Reply via email to