https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117395

            Bug ID: 117395
           Summary: missed SRA opportunity with extracting subpart of type
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The following example:

#include <arm_neon.h>
#include <string.h>

int16x4_t foo(const int16_t *src, int16x8_t c) {
    int16x8_t s[8];
    memcpy (&s[0], src, sizeof(s));

    return vget_low_s16(s[2]) + vget_high_s16(s[2]);
}

compiled with -march=armv9-a -O3

produces:

foo(short const*, __Int16x8_t):
        ldr     q31, [x0, 32]
        sub     sp, sp, #128
        add     sp, sp, 128
        umov    x0, v31.d[0]
        umov    x1, v31.d[1]
        fmov    d30, x0
        fmov    d31, x1
        add     v0.4h, v31.4h, v30.4h
        ret

which is a bit silly, but happens because reload has to spill the subreg that's
taking half the register of s[2] and changing modes.

we cleaned up the stack usage, but not the stack allocation itself.

in GIMPLE we have:

  memcpy (&s[0], src_4(D), 128);
  _1 = BIT_FIELD_REF <s[2], 64, 0>;
  _2 = BIT_FIELD_REF <s[2], 64, 64>;
  _6 = _1 + _2;

but SRA has rejected it doing:

Candidate (26131): s
Allowed ADDR_EXPR of s because of memcpy (&s[0], src_4(D), 128);

! Disqualifying s - No scalar replacements to be created.

The manually scalarized version:

int16x4_t bar(const int16_t *src, int16x8_t c) {
    int16x8_t s;
    memcpy (&s, src, sizeof(s));

    return vget_low_s16(s) + vget_high_s16(s);
}

does however generate the right code.

Does SRA support BIT_FIELD_REFs today? comment in analyze_access_subtree seems
to indicate it may punt on them.

I also seem to remember vaguely that SRA has a limit on the size of the object
it scalarizes?

Reply via email to