https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58483
--- Comment #9 from Marc Glisse <glisse at gcc dot gnu.org> --- __builtin_memcpy (_30, &._82, 12); _31 = MEM[(const int &)_30]; looks like something we should be able to optimize, and there is indeed code in vn_reference_lookup_3 to that effect, but the code doesn't look that nice until very late in the optimization pipeline. At fre1, we haven't inlined the constructor of vector yet. And we only unroll the loop after all the pre/fre passes are done. The most relevant remaining pass is dom3, but it doesn't look like it handles this. If I add another FRE pass next to dom3, we are left with _30 = operator new (12); __builtin_memcpy (_30, &._41, 12); operator delete (_30); D.15905 ={v} {CLOBBER}; return 160; Removing memcpy before operator delete seems to be a work in progress https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00780.html And then we would finally get to the part about removing new/delete pairs. Adding that late FRE pass seems unlikely to happen (?), so we probably need to find some other way. We could notice that iterating on the copy _30 is the same as iterating on the original ._82, but that seems much harder than adding another pass after loop unrolling... I was a bit surprised to notice that when we see __builtin_memcpy(b,a,42); c=b[0]; and we do notice that this is equivalent to "c=a[0]", we only do the rewriting if we can get to a constant value for c. I was expecting an unconditional rewrite. But maybe that would somehow end up pessimizing the code in other cases.