https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271
Bug ID: 90271 Summary: [missed-optimization] failure to keep variables in registers during "faux" memcpy Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: eyalroz at technion dot ac.il Target Milestone: --- Example on GodBolt: https://godbolt.org/z/Q17L1u Consider the following functions: template<typename T1, typename T2> inline void replace_bytes (T1& v1 ,const T2& v2 ,std::size_t k) noexcept { if (k > sizeof(T1) - sizeof(T2)) { return; } std::memcpy( (void*) (((char*)&v1)+k) , (const void*) &v2 , sizeof(T2) ); } For plain-old-data types, this is nothing but the manipulation of v1's bytes (and there are no pointer aliasing issues). So, at least when k is known at compile-time, the compiler should IMHO keep the activity to within registers. And yet - GCC doesn't: With the extra code int foo1() { int x = 3; char c = 1; replace_bytes(x,c,1); return x; } we get (at maximum optimization): foo1(): mov DWORD PTR [rsp-4], 3 mov BYTE PTR [rsp-3], 1 mov eax, DWORD PTR [rsp-4] ret This, while clang _does_ optimize fully and has foo1() simply return 259 (= 256+3). Even if we make k a template parameter - it doesn't help.