http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684

--- Comment #12 from msharov at users dot sourceforge.net ---
I'd like to add that this is not some corner case; this is a very common issue.
In my own projects, the compiler's inability to combine stores is the single
largest reason for using inline assembly and raw casts. Pretty much every time
I have an object 8 or 16 bytes in size, I end up writing a zeroing ctor, copy
ctor, and operator= that use full-object memory access. That's cast to uint64_t
for 8 bytes, and movups/movaps for 16 bytes. It also shows up when writing raw
protocol data, such as X calls, where it is very common to write several
constants in succession. The last time I checked, forcing whole-object moves in
these cases results in projectwide code size reduction ~10%. Unfortunately, it
also causes a variety of aliasing pessimizations, so I also have to test
including or not including each of the above functions to get the smallest code
size. I would be a very big deal if the optimizer could do this.

Reply via email to