https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449
--- Comment #4 from Jens Seifert <jens.seifert at de dot ibm.com> --- Power8 has bcdadd which can be only combined with _Decimal128 if you have some kind of conversion in between BCDs stored in vector register and _Decimal128. On Power9 vec_load_len/vec_store_len can be used to load variable length BCDs. On Power7/8 I can load variable length BCDs as well (with more instructions), but overall it is desirable to have the possibility to convert vector to _Decimal128 and vice versa. I suppose I can survive with inline assembly like below. The assembly works for p7-p9 with optimal speed. The memcpy inline between vector and _Decimal128 is not optimal for -mcpu=power7-9. Always a store/load (lacking XNOP) ending up in load-hit-store issue.