[Bug target/90424] memcpy into vector builtin not optimized

kretz at kde dot org Mon, 13 May 2019 07:54:47 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90424


--- Comment #2 from Matthias Kretz <kretz at kde dot org> ---
FWIW, I agree that "bit-inserting into a default-def" isn't a good idea. My
code, in the meantime, looks more like this (https://godbolt.org/z/D-yfZJ):

template <class T>
using V [[gnu::vector_size(16)]] = T;

template <class T, unsigned M>
V<T> load(const void *p) {
  V<T> r = {};
  __builtin_memcpy(&r, p, M);
  return r;
}

I can't read the SSA code with certainty, but bit-inserting sounds like what I
want to have. Alternatively, the partial vector load could be implemented like
this - and looks even worse (https://godbolt.org/z/nJuTn-):
template <class T>
using V [[gnu::vector_size(16)]] = T;

template <class T, unsigned... I>
V<T> load(const void *p) {
  const T* q = static_cast<const T*>(p);
  V<T> r = {q[I]...};
  return r;
}

// movq or movsd
template V<char  > load<char  , 0,1,2,3,4,5,6,7>(const void *);
template V<short > load<short , 0,1,2,3>(const void *);
template V<int   > load<int   , 0,1>(const void *);
template V<long  > load<long  , 0>(const void *);
template V<float > load<float , 0,1>(const void *);
template V<double> load<double, 0>(const void *);

// movd or movss
template V<char > load<char , 0,1,2,3>(const void *);
template V<short> load<short, 0,1>(const void *);
template V<int  > load<int  , 0>(const void *);
template V<float> load<float, 0>(const void *);

[Bug target/90424] memcpy into vector builtin not optimized

Reply via email to