https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103967
Bug ID: 103967 Summary: x86-64: bitfields make inefficient indexing for array with 16 byte+ objects Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nekotekina at gmail dot com Target Milestone: --- Hello, this problem is seemingly not specific to GCC and is probably well known. Loading or storing 16-byte (or larger) vector from array using a bitfield as an index generates code that can be noticeably smaller in theory. shr esi, 12 ; shift bitfield and esi, 31 ; mask bitfield sal rsi, 4 ; unnecessary, also could drop REX prefix for size pxor xmm0, XMMWORD PTR [rsi+1024+rdi] ; index + offset addressing 1) Second shift can be fused with bitfield load 2) Bitfield load can then be adjusted for shifted indexing (rsi*8) 3) Optionally, array offset can be precomputed if it's used twice or more, which can result in smaller and potencially faster code. shr esi, 12 - 1 ; adjusted shift and esi, 31 << 1 ; adjusted mask which fits in 8-bit immediate pxor xmm0, [rdi + rsi * 8] ; precomputed array offset https://godbolt.org/z/7aa7oaMhn #include <emmintrin.h> struct bitfields { unsigned dummy : 7; unsigned a : 5; unsigned b : 5; unsigned c : 5; }; struct context { unsigned dummy[256]; __m128i data[32]; }; void xor_data(context& ctx, bitfields op) { ctx.data[op.c] = _mm_xor_si128(ctx.data[op.a], ctx.data[op.b]); }