https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96965
Bug ID: 96965
Summary: combine RMW and flags
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: segher at gcc dot gnu.org
Reporter: aoliva at gcc dot gnu.org
Target Milestone: ---
Consider:
typedef unsigned char T;
T i[2];
int f() {
T *p = &i[0], *q = &i[1];
T c = __builtin_add_overflow(*p, 1, p);
*q += c;
}
The desired code sequence on x86_64 is:
addb $1, i(%rip)
adcb $0, i+1(%rip)
What we get instead of the desired addb are separate load, addb, and store
instructions. There are two reasons why we don't combine them to form the
addb:
- when we try_combine the 3 of them, the flag-store insn is still present,
between M (add) and W (store), thus can_combine_p fails. after we combine the
flag-store into adcb, we do not retry
- if I manually force the retry, we break up the M parallel insn into a naked
add in i2, and a flag-setting non-canonical compare in i0. we substitute R and
M into W, for an add without flag-setting. finally, we deal with added_sets,
building a new parallel to hold the RMW add and appending the flag-setter as
the second item, after the combined add. alas, recog won't match them in this
order. *add<mode>3_cc_overflow_1 requires the flag-setter before the
reg-setter.
Here's discussion and combine dumbs from a slightly different testcase that
triggers the same combine behavior:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553242.html