https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117395
Bug ID: 117395
Summary: missed SRA opportunity with extracting subpart of type
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
Target Milestone: ---
The following example:
#include <arm_neon.h>
#include <string.h>
int16x4_t foo(const int16_t *src, int16x8_t c) {
int16x8_t s[8];
memcpy (&s[0], src, sizeof(s));
return vget_low_s16(s[2]) + vget_high_s16(s[2]);
}
compiled with -march=armv9-a -O3
produces:
foo(short const*, __Int16x8_t):
ldr q31, [x0, 32]
sub sp, sp, #128
add sp, sp, 128
umov x0, v31.d[0]
umov x1, v31.d[1]
fmov d30, x0
fmov d31, x1
add v0.4h, v31.4h, v30.4h
ret
which is a bit silly, but happens because reload has to spill the subreg that's
taking half the register of s[2] and changing modes.
we cleaned up the stack usage, but not the stack allocation itself.
in GIMPLE we have:
memcpy (&s[0], src_4(D), 128);
_1 = BIT_FIELD_REF <s[2], 64, 0>;
_2 = BIT_FIELD_REF <s[2], 64, 64>;
_6 = _1 + _2;
but SRA has rejected it doing:
Candidate (26131): s
Allowed ADDR_EXPR of s because of memcpy (&s[0], src_4(D), 128);
! Disqualifying s - No scalar replacements to be created.
The manually scalarized version:
int16x4_t bar(const int16_t *src, int16x8_t c) {
int16x8_t s;
memcpy (&s, src, sizeof(s));
return vget_low_s16(s) + vget_high_s16(s);
}
does however generate the right code.
Does SRA support BIT_FIELD_REFs today? comment in analyze_access_subtree seems
to indicate it may punt on them.
I also seem to remember vaguely that SRA has a limit on the size of the object
it scalarizes?