On 16/09/16 10:02, Richard Biener wrote:
On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
Hi all,
Currently the functions:
int f1(int x, int t)
{
if (x == -1 || x == -2)
t = 1;
return t;
}
int f2(int x, int t)
{
if (x == -1 || x == -2)
return 1;
return t;
}
generate different code on AArch64 even though they have identical
functionality:
f1:
add w0, w0, 2
cmp w0, 1
csinc w0, w1, wzr, hi
ret
f2:
cmn w0, #2
csinc w0, w1, wzr, cc
ret
The problem is that f2 performs the comparison (LTU w0 -2)
whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to simplify
the f1 form
to the f2 form with the simplify-rtx.c rule added in this patch. With this
patch the
codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).
Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
x86_64.
What do you think? Is this a correct generalisation of this issue?
If so, ok for trunk?
Do you see a difference on the GIMPLE level? If so, this kind of
transform looks
appropriate there, too.
The GIMPLE for the two functions looks almost identical:
f1 (intD.7 xD.3078, intD.7 tD.3079)
{
intD.7 x_4(D) = xD.3078;
intD.7 t_5(D) = tD.3079;
unsigned int x.0_1;
unsigned int _2;
x.0_1 = (unsigned int) x_4(D);
_2 = x.0_1 + 2;
if (_2 <= 1)
goto <bb 3>;
else
goto <bb 4>;
;; basic block 3, loop depth 0, count 0, freq 3977, maybe hot
;; basic block 4, loop depth 0, count 0, freq 10000, maybe hot
# t_3 = PHI <t_5(D)(2), 1(3)>
return t_3;
}
f2 (intD.7 xD.3082, intD.7 tD.3083)
{
intD.7 x_4(D) = xD.3082;
intD.7 t_5(D) = tD.3083;
unsigned int x.1_1;
unsigned int _2;
intD.7 _3;
x.1_1 = (unsigned int) x_4(D);
_2 = x.1_1 + 2;
if (_2 <= 1)
goto <bb 4>;
else
goto <bb 3>;
;; basic block 3, loop depth 0, count 0, freq 6761, maybe hot
;; basic block 4, loop depth 0, count 0, freq 10000, maybe hot
# _3 = PHI <1(2), t_5(D)(3)>
return _3;
}
So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with slightly
different CFG. RTL-level transformations (ce1) bring it to the pre-combine RTL
where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1).
So the differences start at RTL level, so I think we need this transformation
there.
However, for the testcase:
unsigned int
foo (unsigned int a, unsigned int b)
{
return (a + 2) > 1;
}
The differences do appear at GIMPLE level, so I think a match.pd pattern would
help here.
I'll look into adding one there as well, but that would be independent of this
patch.
Thanks,
Kyrill
Richard.
Thanks,
Kyrill
2016-09-16 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
* simplify-rtx.c (simplify_relational_operation_1): Add transformation
(GTU (PLUS a C) (C - 1)) --> (LTU a -C).
2016-09-16 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: New test.