https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609
--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 21 Feb 2019, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609 > > --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Header-less version of the testcase: > typedef __SIZE_TYPE__ size_t; > > __attribute__((always_inline)) static inline void > copy (int *restrict a, int *restrict b) > { > *b = *a; > *a = 7; > } > > __attribute__((noinline)) void > floppy (int *mat, size_t *idxs) > { > for (int i = 0; i < 3; i++) > copy (&mat[i%2], &mat[idxs[i]]); > } > > int > main () > { > int mat[2] = {10, 20}; > size_t idxs[3] = {1, 0, 1}; > floppy (mat, idxs); > if (mat[0] != 7 || mat[1] != 10) > __builtin_abort (); > return 0; > } > > Richi, any progress on this? How should the loop unrolling determine when to > use different base/clique and when to use the same? I mean, isn't it > different > if each loop body invokes another inlined call with restrict args vs. when the > loop is within the same original function? I've posted the patch as RFC back in October but got no response (and forgot about it). In general you always have to use different base/clique unless you are copying stmts whose access address do not vary with the iteration. Of course that's only required if the programmer didn't tell you that there's no aliasing - which makes the issue only appear when inlining happens. A similar case would be the user writing for (...) { int * restrict x = &p[i]; int * restruct y = &p[i+1]; *x = *y; } but we do not support this "local" generation of restrict qualified pointers (so you have to jump through inlines to get the same effect). So eventually we could have a flag in struct function whether the function had a function with restrict tags inlined into it and only then perform this copying... or somehow remember all "inlined cliques" and only ever remap those (we could divide the namespace for this - PTA only ever uses a single clique which we could hard-code to 1, but then PTA runs multiple times so it might get a little more complicated). Unfortunately I have no benchmarks or real-world code using restrict so I cannot really assess the impact of the boiler-plate remapping in copy_bbs.