https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102877
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Component|c++ |middle-end Ever confirmed|0 |1 Keywords| |missed-optimization Last reconfirmed| |2021-10-21 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. With memcpy we expand from MEM <vector(2) unsigned char> [(unsigned char *)&value] = { 0, 0 }; MEM <unsigned char[6]> [(char * {ref-all})&value + 2B] = MEM <unsigned char[6]> [(char * {ref-all})&gc]; value.0_1 = value; _2 = __builtin_bswap64 (value.0_1); [tail call] value ={v} {CLOBBER}; return _2; thus we expand 'value' on the stack. Without memcpy we manage to do MEM <unsigned short> [(unsigned char *)&value] = 0; _19 = MEM <unsigned short> [(unsigned char *)&gc]; MEM <unsigned short> [(unsigned char *)&value + 2B] = _19; _21 = MEM <unsigned int> [(unsigned char *)&gc + 2B]; MEM <unsigned int> [(unsigned char *)&value + 4B] = _21; value.0_7 = value; _8 = __builtin_bswap64 (value.0_7); [tail call] which also expands 'value' to the stack but is appearantly nicer to later passes which means the way we expand the aggregate copy of type char[6] is highly sub-optimal (we do 6 byte loads & stores).