[Bug target/91103] AVX512 vector element extract uses more than 1 shuffle instruction; VALIGND can grab any element

rguenth at gcc dot gnu.org Mon, 08 Jul 2019 02:31:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91103


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
So when the vectorizer has the need to use strided stores it would be cheapest
to spill the vector and do N element loads and stores?  I guess we can easily
get bottle-necked by the load/store op bandwith here?  That is, the
vectorizer needs

  for (lane)
    dest[stride * lane] = vector[lane];

thus store a specific (constant) lane of a vector to memory, for each
vector lane.  (we could use a scatter store here but only AVX512 has that
and builing the index vector could be tricky and not supported for all
element types)

[Bug target/91103] AVX512 vector element extract uses more than 1 shuffle instruction; VALIGND can grab any element

Reply via email to