We are working on improving codegen for the following test cases (for all integer types T):
T foo (T x, T y)
{
T diff = x - y;
return x > y ? diff : -diff;
}
T bar (T x, T y)
{
T diff1 = x - y;
T diff2 = y - x;
return x > y ? diff1 : diff2;
}
For signed integers, we already proposed a patch (attached to [1]) that amends
existing match.pd
patterns in order to produce an ABS_EXPR (x - y).
Now, we want to implement the optimization for unsigned integers for AArch64.
For example,
GCC compiles the function bar for uint8_t to (-O3 -fwrapv)
bar_u8:
and w3, w0, 255
and w1, w1, 255
sub w2, w3, w1
sub w0, w1, w3
and w2, w2, 255
cmp w3, w1
and w0, w0, 255
csel w0, w0, w2, ls
ret
whereas clang produces the desired sequence
bar_u8:
and w8, w0, #0xff
sub w8, w8, w1, uxtb
cmp w8, #0
cneg w0, w8, mi
ret
We would like to ask for guidance on where to best implement this optimization
for unsigned
integers. We have considered the following approaches:
- also in match.pd as for the signed integers. However, the existing rule ABS
(A) -> A for
unsigned integers would fold ABS_EXPR (x - y) to (x - y), which is incorrect
if x < y.
Are there other ways to express absolute differences for unsigned types on
gimple level?
- in the aarch64 backend on RTL level: create an if_then_else RTX with
(zero-extended) minus expressions
in both arms. However, the combine-pass dump shows that we are also lacking
an instruction pattern
matching each arm:
Failed to match this instruction:
(set (reg:SI 108)
(zero_extend:SI (minus:QI (subreg:QI (reg/v:SI 103 [ x ]) 0)
(subreg:QI (reg/v:SI 104 [ y ]) 0))))
which we would want to map to the following split of 3 instructions:
and r103, r103, 255
sub r108, r103, r104, uxtb
and r108, r108, 255
Do we want to add both those patterns?
Any advice would be appreciated.
Thanks,
Jennifer
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114999
smime.p7s
Description: S/MIME cryptographic signature
