https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90424
--- Comment #2 from Matthias Kretz <kretz at kde dot org> --- FWIW, I agree that "bit-inserting into a default-def" isn't a good idea. My code, in the meantime, looks more like this (https://godbolt.org/z/D-yfZJ): template <class T> using V [[gnu::vector_size(16)]] = T; template <class T, unsigned M> V<T> load(const void *p) { V<T> r = {}; __builtin_memcpy(&r, p, M); return r; } I can't read the SSA code with certainty, but bit-inserting sounds like what I want to have. Alternatively, the partial vector load could be implemented like this - and looks even worse (https://godbolt.org/z/nJuTn-): template <class T> using V [[gnu::vector_size(16)]] = T; template <class T, unsigned... I> V<T> load(const void *p) { const T* q = static_cast<const T*>(p); V<T> r = {q[I]...}; return r; } // movq or movsd template V<char > load<char , 0,1,2,3,4,5,6,7>(const void *); template V<short > load<short , 0,1,2,3>(const void *); template V<int > load<int , 0,1>(const void *); template V<long > load<long , 0>(const void *); template V<float > load<float , 0,1>(const void *); template V<double> load<double, 0>(const void *); // movd or movss template V<char > load<char , 0,1,2,3>(const void *); template V<short> load<short, 0,1>(const void *); template V<int > load<int , 0>(const void *); template V<float> load<float, 0>(const void *);