[Bug tree-optimization/50417] regression: memcpy with known alignment

npl at chello dot at Fri, 08 Jul 2016 14:12:13 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417


--- Comment #17 from npl at chello dot at ---
I got interrupted by a colleague at work, part 2 of the ramblings...

Everything you could argue against memcpy beeing replaced by simpler
instructions, doesnt change that the same issue persists with the
__builtin_memcpy function, which is explicitely saying you want the
optimizations.

A pointer to a uint32 can be assumed to be proper aligned, CREATING such a
pointer thats not aligned is already undefined behaviour by the standard (the
compiler could zero out bits for example). I dont think that what happens
afterwards with something that shouldn`t exist in the first place is an
argument against optimizing proper code.

Further, I lack a consistent way of dealing with potential aliasing pointers.
Using memcpy seems the sanest way, simply because its standards compliant,
supported everywhere and your code wont mysteriously break once you use LTO or
higher optimization settings.
Compilers can reliably detect this and replace memcpy since years (ignoring
this issue, which I would consider a bug), so there is no draw back. Its a
feature common pretty much everywhere, and a valid recommendation in many
discussions related to the topic.

Consider the example below for illustration, FIXEDMEMCPY is how the plain
memcpy should work and already does work for archs with unaligned access.
(I had planned to post the code for 32bit x86, but the assembly is rather ugly,
amd64 would work with "unsigned long" and "unsigned long long").

I already ran in such issues, when different software components define their
own fixedwidth types. Its a practical issue where pointing to paragraphs of the
standard dont help, unless you provide a proper solution with it. The
FIXEDMEMCPY hack is fine for gcc but compilerspecific.

In short:
* Optimizing memcpy to simple instructions is a reality and expected, the
behaviour (slow code) on arm (and other archs with req. alignment) is a
unwelcome oddity
* memcpy is one of the few ways to deal with aliasing, and the most standards
compliant. (theres unions too, but thats not standards compliant)
* I dont see a problem in replacing standard functions (and __builtin_memcpy
has the same issue)
* I dont see a problem in expecting a correctly aligned pointer, and doing
undefined behaviour if the pointer could cause undefined behaviour.



typedef unsigned uint32_t;
typedef unsigned long uint32_alt;
_Static_assert(sizeof(uint32_t) == sizeof(uint32_alt), "you picked a bad
architecture or typedefs for this example");

#define FIXEDMEMCPY(a, b, s) __builtin_memcpy(__builtin_assume_aligned(a,
__alignof__(*a)), __builtin_assume_aligned(b, __alignof__(*b)), s)
unsigned breakme(uint32_t *ptr, uint32_alt *ptr2, uint32_t a)
{
        /* normally in different compilation units, but LTO doesnt care */
        *ptr = 0;
        *ptr2 = a;
        return *ptr;
}

unsigned fixme(uint32_t *ptr, uint32_alt *ptr2, uint32_t a)
{
        /* fixes aliasing, but should be as fast as simple accesses */
        uint32_t val = 0;
        FIXEDMEMCPY(ptr, &val, 4);
        FIXEDMEMCPY(ptr2 , &a, 4);
        uint32_t val2;
        FIXEDMEMCPY(&val2, ptr, 4);
        return val2;
}

00000000 <breakme>:
   0:   e3a03000        mov     r3, #0
   4:   e5803000        str     r3, [r0]
   8:   e1a00003        mov     r0, r3 // Oops: retval = 0
   c:   e5812000        str     r2, [r1]
  10:   e12fff1e        bx      lr

00000014 <fixme>:
  14:   e3a03000        mov     r3, #0
  18:   e5803000        str     r3, [r0]
  1c:   e5812000        str     r2, [r1]
  20:   e5900000        ldr     r0, [r0] // The load thats missing above
  24:   e24dd010        sub     sp, sp, #16 // Time for another 
  28:   e28dd010        add     sp, sp, #16 // Bugreport ?
  2c:   e12fff1e        bx      lr

[Bug tree-optimization/50417] regression: memcpy with known alignment

Reply via email to