https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96562
Bug ID: 96562 Summary: Rather poor assembly generated for copy-list-initialization in return statement. Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: maxim.yegorushkin at gmail dot com Target Milestone: --- Rather poor assembly generated for trivial code. The following code: template<class P, class SizeT> struct Span { P begin_; SizeT size_; }; Span<char*, unsigned> f(char* p, char* q) { return {p, static_cast<unsigned>(q - p)}; } When compiled with gcc-6.1 to gcc-10.2 with options "-O3 -march=skylake -mtune=skylake" produces unexpectedly long and sub-optimal assembly code: f(unsigned char*, unsigned char*): mov QWORD PTR [rsp-16], 0 mov QWORD PTR [rsp-24], rdi sub rsi, rdi vmovdqa xmm1, XMMWORD PTR [rsp-24] vpinsrd xmm0, xmm1, esi, 2 vmovdqa XMMWORD PTR [rsp-24], xmm0 mov rax, QWORD PTR [rsp-24] mov rdx, QWORD PTR [rsp-16] ret clang with the same options produces the expected assembly: f(unsigned char*, unsigned char*): mov rdx, rsi mov rax, rdi sub edx, eax ret Is there a way to make gcc produce the expected assembly, please? https://gcc.godbolt.org/z/bacGW8