https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102652
ktkachov at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ktkachov at gcc dot gnu.org Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed| |2021-10-08 --- Comment #1 from ktkachov at gcc dot gnu.org --- Confirmed on the GCC 11 release. There is an active effort to improve the code generation for these intrinsics and current trunk produces: bug: ldr q5, [x1] sshr v4.16b, v5.16b, 7 mov v0.16b, v5.16b mov v1.16b, v4.16b mov v2.16b, v4.16b mov v3.16b, v4.16b st4 {v0.16b - v3.16b}, [x0], 64 ldr q4, [x1, 16] mov v0.16b, v4.16b sshr v4.16b, v4.16b, 7 mov v1.16b, v4.16b mov v2.16b, v4.16b mov v3.16b, v4.16b st4 {v0.16b - v3.16b}, [x0] ret Not optimal yet, but moving in the right direction