https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82905

            Bug ID: 82905
           Summary: vector shift forced to 32 bytes
           Product: gcc
           Version: 7.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bruno.uy at gmail dot com
  Target Milestone: ---

#include <cstdint>
using namespace std;

int const count = 1024;
uint8_t p[count];

void mul(uint16_t m)
{
        for (int i = 0; i < count; ++i)
        {
                p[i] = uint16_t(p[i] * m) >> 8;
        }
}

compiled for x86-64 with -O3 generates psrad instructions instead of psrlw
instructions.  Also, the pand instructions are not needed.

Reply via email to