https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122573

            Bug ID: 122573
           Summary: C++ missed invariant motion vs. vectorization
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

The following C++ testcase has the m_column?[?] loads not hoisted from the loop
causing a high VF and with AVX512 masked epilogues high pressure on the
compare unit of Zen4/5 leading to slowdown when numPixels is low.

I'm not sure invariant motion would be valid here, but the vectorizer does
runtime alias checking against the this->m_column accesses, and the
vectorizer detects the loads as invariant.  Possibly SLP discovery could
treat them so, ignoring that they are not "grouped accesses".

struct S {
    void apply(const void * inImg, void * outImg, long numPixels) const;
    float m_column1[4];
    float m_column2[4];
    float m_column3[4];
    float m_column4[4];
};

void S::apply(const void * inImg, void * outImg, long numPixels) const
{
    const float * in = (const float *)inImg;
    float * out = (float *)outImg;

    for (long idx = 0; idx < numPixels; ++idx)
    {
        const float r = in[0];
        const float g = in[1];
        const float b = in[2];
        const float a = in[3];

        out[0] = r*m_column1[0]
               + g*m_column2[0]
               + b*m_column3[0]
               + a*m_column4[0];
        out[1] = r*m_column1[1]
               + g*m_column2[1]
               + b*m_column3[1]
               + a*m_column4[1];
        out[2] = r*m_column1[2]
               + g*m_column2[2]
               + b*m_column3[2]
               + a*m_column4[2];
        out[3] = r*m_column1[3]
               + g*m_column2[3]
               + b*m_column3[3]
               + a*m_column4[3];

        in  += 4;
        out += 4;
    }
}

Reply via email to