[Bug c++/88531] New: Index data types when targeting AVX-512 vectorization with gather/scatter

2018-12-17 Thread florian.schornbaum at siemens dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531

Bug ID: 88531
   Summary: Index data types when targeting AVX-512 vectorization
with gather/scatter
   Product: gcc
   Version: 8.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: florian.schornbaum at siemens dot com
  Target Milestone: ---

Hi,

I realized that GCC fails to vectorize simple loops if there are indirect loads
(or stores) and the index used for the indirect access doesn't match a very
small subset of possible integer data types. I'm targeting AVX-512. This is the
MWE (only an indirect load, but a direct store):

==
#include 

using loop_t = uint32_t;
using idx_t = uint32_t;

void loop(double * const __restrict__ dst,
  double const * const __restrict__ src,
  idx_t const * const __restrict__ idx,
  loop_t const begin,
  loop_t const end)
{
for (loop_t i = begin; i < end; ++i)
{
dst[i] = 42.0 * src[idx[i]];
}
}
==
See: https://godbolt.org/z/Ps-sOv

This only vectorizes if idx_t is int32_t, int64_t, or uint64_t.

My suspicion is this goes back to the gather/scatter instructions of AVX-512
that come in two flavors: with 32 and 64 bit signed integers for the indices.
Unsigned 64 bit probably works (on a 64 bit architecture) because it looks like
it's just treated as a signed 64 bit value, which probably is due to (from the
documentation):
"... The scaled index may require more bits to represent than the address bits
used by the processor (e.g., in 32-bit mode, if the scale is greater than one).
In this case, the most significant bits beyond the number of address bits are
ignored. ..."

Unfortunately, for int16_t, uint16_t, and uint32_t, this does not vectorize.
Although the 32 bit version of gather/scatter could be used -- with proper zero
padding -- for int16_t and uint16_t. Likewise, the 64 bit version could be used
with indices of type uint32_t.

Although the code example only uses idx[i] for loading, it appears to be the
exact same issue when using idx[i] for storing (meaning: when scatter would be
required).

Are there any plans to get this working?
Or did I maybe miss something and this should already work?

Many thanks in advance

Florian

[Bug tree-optimization/88531] Index data types when targeting AVX-512 vectorization with gather/scatter

2018-12-17 Thread florian.schornbaum at siemens dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531

--- Comment #3 from Florian Schornbaum  
---
Thank you for your very quick replies!

I'm aware of 88464 (I think this is the recent work you are referring to?), but
this had no effect on the index data type issue I was describing.

Even if the gathers/scatters do widening themselves, my code example is not
vectorized when using int16_t. Probably because no gather/scatter is created by
GCC in the first place?

As for uint32_t with -m64 (= unsgined int on x86-64, and sadly the problem that
we are facing): I'm aware that manually transforming the index array from
uint32_t to int64_t is a solution, but one that comes at a cost for us.
Looking at clang, they use "vpmovzxdq" when loading the data. Which is the only
difference to the int64_t/uint64_t version, which uses a different load.

Are there any plans for GCC to make these "unfitting" index data types work
with AVX-512 gathers/scatters?

[Bug tree-optimization/88531] Index data types when targeting AVX-512 vectorization with gather/scatter

2019-01-21 Thread florian.schornbaum at siemens dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531

--- Comment #4 from Florian Schornbaum  
---
Hi Jakub, Richard,

I hope you both had a good start into 2019.

I'm still wondering if there are any plans to make arbitrary index data types
work with gather/scatter?

If there are no such plans at the moment, we will work around this issue on our
side.

[Bug tree-optimization/88531] Index data types when targeting AVX-512 vectorization with gather/scatter

2019-01-23 Thread florian.schornbaum at siemens dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531

--- Comment #6 from Florian Schornbaum  
---
Thanks Jakub. That's good information to have.

We would certainly be willing to help since this is something that we would
really like GCC to be able to handle.

Does it make sense for us, as developers that have never been involved in GCC
development, to have a look if you give as some pointers on where to look?

[Bug tree-optimization/88531] Index data types when targeting AVX-512 vectorization with gather/scatter

2019-01-23 Thread florian.schornbaum at siemens dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531

--- Comment #8 from Florian Schornbaum  
---
They are definitely a good source to ask.
We'll try to get in contact with them and see if we can get help/insight.

Thanks for all your input so far!