Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

Kyrill Tkachov Fri, 16 Sep 2016 02:21:13 -0700


On 16/09/16 10:02, Richard Biener wrote:

On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
<[email protected]> wrote:

Hi all,


Currently the functions:
int f1(int x, int t)
{
   if (x == -1 || x == -2)
     t = 1;
   return t;
}

int f2(int x, int t)
{
   if (x == -1 || x == -2)
     return 1;
   return t;
}

generate different code on AArch64 even though they have identical
functionality:
f1:
         add     w0, w0, 2
         cmp     w0, 1
         csinc   w0, w1, wzr, hi
         ret

f2:
         cmn     w0, #2
         csinc   w0, w1, wzr, cc
         ret

The problem is that f2 performs the comparison (LTU w0 -2)
whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to simplify
the f1 form
to the f2 form with the simplify-rtx.c rule added in this patch. With this
patch the
codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
x86_64.
What do you think? Is this a correct generalisation of this issue?
If so, ok for trunk?

Do you see a difference on the GIMPLE level?  If so, this kind of
transform looks
appropriate there, too.


The GIMPLE for the two functions looks almost identical:
f1 (intD.7 xD.3078, intD.7 tD.3079)
{
  intD.7 x_4(D) = xD.3078;
  intD.7 t_5(D) = tD.3079;
  unsigned int x.0_1;
  unsigned int _2;
  x.0_1 = (unsigned int) x_4(D);

  _2 = x.0_1 + 2;
  if (_2 <= 1)
    goto <bb 3>;
  else
    goto <bb 4>;
;;   basic block 3, loop depth 0, count 0, freq 3977, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 10000, maybe hot

  # t_3 = PHI <t_5(D)(2), 1(3)>
  return t_3;
}

f2 (intD.7 xD.3082, intD.7 tD.3083)
{
  intD.7 x_4(D) = xD.3082;
  intD.7 t_5(D) = tD.3083;
  unsigned int x.1_1;
  unsigned int _2;
  intD.7 _3;

  x.1_1 = (unsigned int) x_4(D);

  _2 = x.1_1 + 2;
  if (_2 <= 1)
    goto <bb 4>;
  else
    goto <bb 3>;

;;   basic block 3, loop depth 0, count 0, freq 6761, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 10000, maybe hot
  # _3 = PHI <1(2), t_5(D)(3)>
  return _3;

}

So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with slightly
different CFG.  RTL-level transformations (ce1) bring it to the pre-combine RTL
where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1).

So the differences start at RTL level, so I think we need this transformation 
there.
However, for the testcase:
unsigned int
foo (unsigned int a, unsigned int b)
{
  return (a + 2) > 1;
}

The differences do appear at GIMPLE level, so I think a match.pd pattern would 
help here.
I'll look into adding one there as well, but that would be independent of this 
patch.

Thanks,
Kyrill

Richard.

Thanks,
Kyrill

2016-09-16  Kyrylo Tkachov  <[email protected]>

     * simplify-rtx.c (simplify_relational_operation_1): Add transformation
     (GTU (PLUS a C) (C - 1)) --> (LTU a -C).

2016-09-16  Kyrylo Tkachov  <[email protected]>

     * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: New test.

Re: [PATCH][simplify-rtx] (GTU (PLUS a C) (C - 1)) --> (LTU a -C)

Reply via email to