https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65413
--- Comment #1 from Martin Sebor <msebor at gcc dot gnu.org> --- Actually, similarly inefficient code is generated even for aggregates that do fit into a register. The trigger appears to be that the aggregate not take up an even multiple of a register. For example, returning a "struct { char a[7]; }" results in the code below. Another example is "struct { short a[3]; }" foo: mr 10,3 rlwinm 8,3,0,0xff li 9,0 rldicl 7,10,48,56 rldimi 9,8,0,56 rldicl 8,10,56,56 rldimi 9,8,8,48 rldicl 10,10,40,56 srdi 8,3,32 rldimi 9,7,16,40 rldimi 9,10,24,32 rlwinm 10,8,0,0xff rldimi 9,10,32,24 rldicl 8,8,56,56 rldicl 3,3,16,56 rldimi 9,8,40,16 rldimi 9,3,48,8 mr 3,9 blr