Optimizing codegen for absolute differences in AArch64

Jennifer Schmitz via Gcc Tue, 14 Jan 2025 07:33:04 -0800

We are working on improving codegen for the following test cases (for all 
integer types T):


T foo (T x, T y)
{
  T diff = x - y;
  return x > y ? diff : -diff;
}

T bar (T x, T y)
{
  T diff1 = x - y;
  T diff2 = y - x;
  return x > y ? diff1 : diff2;
}

For signed integers, we already proposed a patch (attached to [1]) that amends 
existing match.pd
patterns in order to produce an ABS_EXPR (x - y).

Now, we want to implement the optimization for unsigned integers for AArch64. 
For example,
GCC compiles the function bar for uint8_t to (-O3 -fwrapv)
bar_u8:
        and     w3, w0, 255
        and     w1, w1, 255
        sub     w2, w3, w1
        sub     w0, w1, w3
        and     w2, w2, 255
        cmp     w3, w1
        and     w0, w0, 255
        csel    w0, w0, w2, ls
        ret

whereas clang produces the desired sequence
bar_u8:
        and     w8, w0, #0xff
        sub     w8, w8, w1, uxtb
        cmp     w8, #0
        cneg    w0, w8, mi
        ret

We would like to ask for guidance on where to best implement this optimization 
for unsigned
integers. We have considered the following approaches:
- also in match.pd as for the signed integers. However, the existing rule ABS 
(A) -> A for
  unsigned integers would fold ABS_EXPR (x - y) to (x - y), which is incorrect 
if x < y.
  Are there other ways to express absolute differences for unsigned types on 
gimple level?
- in the aarch64 backend on RTL level: create an if_then_else RTX with 
(zero-extended) minus expressions
  in both arms. However, the combine-pass dump shows that we are also lacking 
an instruction pattern
  matching each arm:
    Failed to match this instruction:
    (set (reg:SI 108)
        (zero_extend:SI (minus:QI (subreg:QI (reg/v:SI 103 [ x ]) 0)
                (subreg:QI (reg/v:SI 104 [ y ]) 0))))
  which we would want to map to the following split of 3 instructions:
    and r103, r103, 255
    sub r108, r103, r104, uxtb
    and r108, r108, 255
  Do we want to add both those patterns?

Any advice would be appreciated.
Thanks,
Jennifer

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114999

smime.p7s
Description: S/MIME cryptographic signature

Optimizing codegen for absolute differences in AArch64

Reply via email to