https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103967

            Bug ID: 103967
           Summary: x86-64: bitfields make inefficient indexing for array
                    with 16 byte+ objects
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nekotekina at gmail dot com
  Target Milestone: ---

Hello, this problem is seemingly not specific to GCC and is probably well
known. Loading or storing 16-byte (or larger) vector from array using a
bitfield as an index generates code that can be noticeably smaller in theory.

        shr     esi, 12 ; shift bitfield
        and     esi, 31 ; mask bitfield
        sal     rsi, 4 ; unnecessary, also could drop REX prefix for size
        pxor    xmm0, XMMWORD PTR [rsi+1024+rdi] ; index + offset addressing

1) Second shift can be fused with bitfield load
2) Bitfield load can then be adjusted for shifted indexing (rsi*8)
3) Optionally, array offset can be precomputed if it's used twice or more,
which can result in smaller and potencially faster code.

shr esi, 12 - 1 ; adjusted shift
and esi, 31 << 1 ; adjusted mask which fits in 8-bit immediate
pxor xmm0, [rdi + rsi * 8] ; precomputed array offset

https://godbolt.org/z/7aa7oaMhn

#include <emmintrin.h>
struct bitfields
{
    unsigned dummy : 7;
    unsigned a : 5;
    unsigned b : 5;
    unsigned c : 5;
};
struct context
{
    unsigned dummy[256];
    __m128i data[32];
};

void xor_data(context& ctx, bitfields op)
{
    ctx.data[op.c] = _mm_xor_si128(ctx.data[op.a], ctx.data[op.b]);
}

Reply via email to