https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117395
Bug ID: 117395 Summary: missed SRA opportunity with extracting subpart of type Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- The following example: #include <arm_neon.h> #include <string.h> int16x4_t foo(const int16_t *src, int16x8_t c) { int16x8_t s[8]; memcpy (&s[0], src, sizeof(s)); return vget_low_s16(s[2]) + vget_high_s16(s[2]); } compiled with -march=armv9-a -O3 produces: foo(short const*, __Int16x8_t): ldr q31, [x0, 32] sub sp, sp, #128 add sp, sp, 128 umov x0, v31.d[0] umov x1, v31.d[1] fmov d30, x0 fmov d31, x1 add v0.4h, v31.4h, v30.4h ret which is a bit silly, but happens because reload has to spill the subreg that's taking half the register of s[2] and changing modes. we cleaned up the stack usage, but not the stack allocation itself. in GIMPLE we have: memcpy (&s[0], src_4(D), 128); _1 = BIT_FIELD_REF <s[2], 64, 0>; _2 = BIT_FIELD_REF <s[2], 64, 64>; _6 = _1 + _2; but SRA has rejected it doing: Candidate (26131): s Allowed ADDR_EXPR of s because of memcpy (&s[0], src_4(D), 128); ! Disqualifying s - No scalar replacements to be created. The manually scalarized version: int16x4_t bar(const int16_t *src, int16x8_t c) { int16x8_t s; memcpy (&s, src, sizeof(s)); return vget_low_s16(s) + vget_high_s16(s); } does however generate the right code. Does SRA support BIT_FIELD_REFs today? comment in analyze_access_subtree seems to indicate it may punt on them. I also seem to remember vaguely that SRA has a limit on the size of the object it scalarizes?