https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103967
Bug ID: 103967
Summary: x86-64: bitfields make inefficient indexing for array
with 16 byte+ objects
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: nekotekina at gmail dot com
Target Milestone: ---
Hello, this problem is seemingly not specific to GCC and is probably well
known. Loading or storing 16-byte (or larger) vector from array using a
bitfield as an index generates code that can be noticeably smaller in theory.
shr esi, 12 ; shift bitfield
and esi, 31 ; mask bitfield
sal rsi, 4 ; unnecessary, also could drop REX prefix for size
pxor xmm0, XMMWORD PTR [rsi+1024+rdi] ; index + offset addressing
1) Second shift can be fused with bitfield load
2) Bitfield load can then be adjusted for shifted indexing (rsi*8)
3) Optionally, array offset can be precomputed if it's used twice or more,
which can result in smaller and potencially faster code.
shr esi, 12 - 1 ; adjusted shift
and esi, 31 << 1 ; adjusted mask which fits in 8-bit immediate
pxor xmm0, [rdi + rsi * 8] ; precomputed array offset
https://godbolt.org/z/7aa7oaMhn
#include <emmintrin.h>
struct bitfields
{
unsigned dummy : 7;
unsigned a : 5;
unsigned b : 5;
unsigned c : 5;
};
struct context
{
unsigned dummy[256];
__m128i data[32];
};
void xor_data(context& ctx, bitfields op)
{
ctx.data[op.c] = _mm_xor_si128(ctx.data[op.a], ctx.data[op.b]);
}