https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79252
--- Comment #2 from Zoltan Hidvegi <zoltan at hidvegi dot com> --- Created attachment 41855 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41855&action=edit Possible vec_insert implementation The attached code shows two implementation for inserting a byte to a variable position, and one for inserting a halfword. The byte insert functions can also be modified for halfword and word. It is for little-endian, could be modified for big-endian. Usually I try to pack multiple values into a register and use a single mtvsrd instead of multiple mtvsrd instructions. Note that vec_unpackl actually generates vupkh... I never understood the logic for this, but that's how it is. Also the Power Architecture 64-bit ELFv2 API Spec document says that "vec_insert (v, 3, x) is equivalent to v[3] = x" which is probably not correct, the arguments are not in the right order, and v is not modified, instead the results is returned. I haven't tested this code much, not sure which is faster, these are just ideas what could be done, which should be way better than the store versions that cannot even use store forwarding because a byte store is not forwarded to a vector load.