http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684
--- Comment #12 from msharov at users dot sourceforge.net --- I'd like to add that this is not some corner case; this is a very common issue. In my own projects, the compiler's inability to combine stores is the single largest reason for using inline assembly and raw casts. Pretty much every time I have an object 8 or 16 bytes in size, I end up writing a zeroing ctor, copy ctor, and operator= that use full-object memory access. That's cast to uint64_t for 8 bytes, and movups/movaps for 16 bytes. It also shows up when writing raw protocol data, such as X calls, where it is very common to write several constants in succession. The last time I checked, forcing whole-object moves in these cases results in projectwide code size reduction ~10%. Unfortunately, it also causes a variety of aliasing pessimizations, so I also have to test including or not including each of the above functions to get the smallest code size. I would be a very big deal if the optimizer could do this.