https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118276
--- Comment #6 from Ben FrantzDale <benfrantzdale at gmail dot com> --- I think I understand. You are saying that gcc wants to (or must?) zero-out the entire struct in the trivial case, which includes `S() = default;` but with `S() noexcept {}` it winds up on a code path where it knows it doesn't have to zero the padding, which leads to the xmm version which on the quick-bench.com CPU turned out faster? This is backed up by the fact that if I do ``` S() noexcept { std::memset(this, 0, sizeof(*this)); } ``` I get the `stosq` version, and if I leave c and x uninitialized but `memset` just them to zero, I still get the xmm version. https://godbolt.org/z/bbnc8a66E So, is this inconsistency a bug, or just that the optimizer honestly thinks it's faster to use `stosq` (which may be true on some CPUs)? FWIW Clang generates the XMM version for all cases: https://godbolt.org/z/EjGEcd4rv