https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64896
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Created attachment 34685 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34685&action=edit gcc5-pr64896.patch I think if !aggregate_value_p, we really should be using a temporary var rather than RESULT_DECL. That said, if it doesn't generate optimal code, we should optimize it, but it wouldn't be IPA's or expand_thunk's task, such problem would affect all similar user written code. But, on this exact testcase with the patch I get identical assembly to -fno-ipa-icf. The movl $0, -24(%rsp) movl $0, -20(%rsp) xorl %edx, %edx movq -24(%rsp), %rax is of course not optimal, xorl %eax, %eax; xorl %edx, %edx would do too, but it is a matter of some RTL optimization of late GIMPLE to improve this. But, related to this, I've noticed that: 1) pass_nrv doesn't seem to work very well on x86_64, apparently the thing is that the temporaries usually have DECL_ALIGN bumped by LOCAL_DECL_ALIGNMENT to 128 bits, while RESULT_DECL typically does not that "optimization", so the nrv pass gives up. Wonder at least for the case where the decl isn't addressable why would we care about DECL_ALIGN of the temporary (rather than just TYPE_ALIGN). 2) even on i386 where tree nrv usually works, I see on testcase like: struct A { int m_x, m_y; }; struct Q { struct A m_location; int m_size; long m_foo; }; struct Q foo (); struct Q bar () { struct Q x = foo (); return x; } (in C, so that C++ nrv doesn't trigger) unnecessary stack adjustments