https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82905
Bug ID: 82905 Summary: vector shift forced to 32 bytes Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: bruno.uy at gmail dot com Target Milestone: --- #include <cstdint> using namespace std; int const count = 1024; uint8_t p[count]; void mul(uint16_t m) { for (int i = 0; i < count; ++i) { p[i] = uint16_t(p[i] * m) >> 8; } } compiled for x86-64 with -O3 generates psrad instructions instead of psrlw instructions. Also, the pand instructions are not needed.