https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88464

            Bug ID: 88464
           Summary: AVX-512 vectorization of masked scatter failing with
                    "not suitable for scatter store"
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mokreutzer at gmail dot com
  Target Milestone: ---

Hi,

I have the following simple loop which I want to compile for Skylake (AVX-512):

================================
#pragma GCC ivdep
for (int i = 0; i < n; ++i)
{
    if (b[off1[i]] < b[off2[i]])
        a[off1[i]] = b[off1[i]];
    else
        a[off2[i]] = b[off2[i]];
}
================================

Given AVX-512 masked scatter instructions and the absence of data conflicts
("ivdep"), vectorization should be possible along the lines of:
1. gather b[off1[i]] into zmm1
2. gather b[off2[i]] into zmm2
3. compare zmm1 and zmm2 with "<" and store result in mask1
4. compare zmm1 and zmm2 with ">=" and store result in mask2
5. scatter zmm1 to a[off1[i]] with mask1
6. scatter zmm2 to a[off2[i]] with mask2

However, GCC is not able to vectorize this loop (failing with "not vectorized:
not suitable for scatter store"). I have tested this with the latest GCC trunk
but the issue also occurs with all previous versions. If you want to have a
look, here's a Godbolt example: https://godbolt.org/z/Is7Zml

I understand that this loop is not a trivial case for vectorization and AVX-512
hasn't been around for too long, so it's likely that it isn't fully supported
yet. But still, I'm wondering:
1. Am I missing some flags or hints to GCC in order to vectorize this loop? (I
can imagine something related to the cost model, etc..)
2. Or is GCC currently just not capable of vectorizing it?

If the answer is "2.":
3. Can we estimate to amount of work needed to support this?
4. Is there any plan on when this kind of pattern will be supported? 
5. If it's realistic for a non-GCC developer to look into this, is there
anything I can do to help?


Many thanks in advance,
Moritz

Reply via email to