https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108030
--- Comment #6 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> --- This improves the situation. But according to tests of fixed_size_simd in a GNU Radio prototype it still isn't enough. It seems like I need to always_inline all non-cmath operations on fixed_size. It's a shame to see GCC choose "4 (or more) stores, vzeroupper, function call, 4 loads, 4 arithmetic SIMD ops, 4 stores, ret, 4 (or more) loads" over "4 arithmetic SIMD ops inlined". For simd, the semantics should similar to builtin types, so always_inline would anyway be the right choice (i.e. inline on -O0 as well). But there seems to be room for improvement in GCC. I'll try to find time to produce reasonable testcases for a new PR.