https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79252

--- Comment #2 from Zoltan Hidvegi <zoltan at hidvegi dot com> ---
Created attachment 41855
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41855&action=edit
Possible vec_insert implementation

The attached code shows two implementation for inserting a byte to a variable
position, and one for inserting a halfword. The byte insert functions can also
be modified for halfword and word. It is for little-endian, could be modified
for big-endian. Usually I try to pack multiple values into a register and use a
single mtvsrd instead of multiple mtvsrd instructions.

Note that vec_unpackl actually generates vupkh... I never understood the logic
for this, but that's how it is. Also the Power Architecture 64-bit ELFv2 API
Spec document says that "vec_insert (v, 3, x) is equivalent to v[3] = x" which
is probably not correct, the arguments are not in the right order, and v is not
modified, instead the results is returned.

I haven't tested this code much, not sure which is faster, these are just ideas
what could be done, which should be way better than the store versions that
cannot even use store forwarding because a byte store is not forwarded to a
vector load.

Reply via email to