https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58483

--- Comment #9 from Marc Glisse <glisse at gcc dot gnu.org> ---
  __builtin_memcpy (_30, &._82, 12);
  _31 = MEM[(const int &)_30];

looks like something we should be able to optimize, and there is indeed code in
vn_reference_lookup_3 to that effect, but the code doesn't look that nice until
very late in the optimization pipeline. At fre1, we haven't inlined the
constructor of vector yet. And we only unroll the loop after all the pre/fre
passes are done. The most relevant remaining pass is dom3, but it doesn't look
like it handles this. If I add another FRE pass next to dom3, we are left with

  _30 = operator new (12);
  __builtin_memcpy (_30, &._41, 12);
  operator delete (_30);
  D.15905 ={v} {CLOBBER};
  return 160;

Removing memcpy before operator delete seems to be a work in progress
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00780.html
And then we would finally get to the part about removing new/delete pairs.

Adding that late FRE pass seems unlikely to happen (?), so we probably need to
find some other way.

We could notice that iterating on the copy _30 is the same as iterating on the
original ._82, but that seems much harder than adding another pass after loop
unrolling...

I was a bit surprised to notice that when we see

__builtin_memcpy(b,a,42);
c=b[0];

and we do notice that this is equivalent to "c=a[0]", we only do the rewriting
if we can get to a constant value for c. I was expecting an unconditional
rewrite. But maybe that would somehow end up pessimizing the code in other
cases.

Reply via email to