https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271

            Bug ID: 90271
           Summary: [missed-optimization] failure to keep variables in
                    registers during "faux" memcpy
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Example on GodBolt: https://godbolt.org/z/Q17L1u

Consider the following functions:

template<typename T1, typename T2>
inline void replace_bytes (T1& v1 ,const T2& v2 ,std::size_t k) noexcept
{
   if (k > sizeof(T1) - sizeof(T2)) { return; }

   std::memcpy( (void*) (((char*)&v1)+k) , (const void*) &v2 , sizeof(T2) );
}

For plain-old-data types, this is nothing but the manipulation of v1's bytes
(and there are no pointer aliasing issues). So, at least when k is known at
compile-time, the compiler should IMHO keep the activity to within registers.

And yet - GCC doesn't: With the extra code

int foo1()
{
  int x = 3;
  char c = 1;
  replace_bytes(x,c,1);
  return x;
}

we get (at maximum optimization):

foo1():
        mov     DWORD PTR [rsp-4], 3
        mov     BYTE PTR [rsp-3], 1
        mov     eax, DWORD PTR [rsp-4]
        ret

This, while clang _does_ optimize fully and has foo1() simply return 259 (=
256+3).

Even if we make k a template parameter - it doesn't help.

Reply via email to