On 04/16/2015 05:04 AM, Jiong Wang wrote:
This is a rework of
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01998.html
After second thinking, I feel it's better to fix this in earlier stage
during RTL expand which is more generic, and we also avoid making the
already complex combine pass complexer.
Currently gcc expand wide mode left shift to some generic complex
instruction sequences, while if we have known the high part of wide mode
all comes from sign extension, the expand logic could be simplifed.
Given the following example,
T A = (T) B << const_imm_shift
We know the high part of A are all comes from sign extension, if
* T is the next wider type of word_mode.
For example, for aarch64, if type T is 128int (TImode), and B is with
type SImode or DImode, then tree analyzer know that the high part of
TImode result all comes from sign extension, and kept them in range info.
|< T >|
| high | low |
|<- sizel ->|
For above example, we could simplify the expand logic into
1. low = low << const_imm_shift;
2. high = low >> (sizel - const_imm_shift) */
We can utilize the arithmetic right shift to do the sign
extension. Those reduntant instructions will be optimized out later.
For actual .s improvement,
AArch64
=======
__int128_t
foo (int data)
{
return (__int128_t) data << 50;
}
old:
sxtw x2, w0
asr x1, x2, 63
lsl x0, x2, 50
lsl x1, x1, 50
orr x1, x1, x2, lsr 14
new:
sxtw x1, w0
lsl x0, x1, 50
asr x1, x1, 14
ARM (.fpu softvfp)
===========
long long
shift (int data)
{
return (long long) data << 20;
}
old:
stmfd sp!, {r4, r5}
mov r5, r0, asr #31
mov r3, r0
mov r0, r0, asl #20
mov r1, r5, asl #20
orr r1, r1, r3, lsr #12
ldmfd sp!, {r4, r5}
bx lr
new:
mov r1, r0
mov r0, r0, asl #20
mov r1, r1, asr #12
bx lr
Test
====
x86 bootstrap OK, regression test OK.
AArch64 bootstrap OK, regression test on board OK.
Regards,
Jiong
2015-04-116 Jiong.Wang <jiong.w...@arm.com>
gcc/
* expr.c (expand_expr_real_2): Take tree range info into account when
expanding LSHIFT_EXPR.
gcc/testsuite
* gcc.dg/wide_shift_64_1.c: New testcase.
* gcc.dg/wide_shift_128_1.c: Ditto.
* gcc.target/aarch64/ashlti3_1.c: Ditto.
* gcc.target/arm/ashldisi_1.c: Ditto.
Funny, I find myself wanting this transformation in both places :-)
Expansion time so that we generate efficient code from the start and
combine to catch those cases which are too complex to see at expansion,
but due to other optimizations become visible in the combiner.
Sadly, it's been fairly common practice for targets to define
double-word shift patterns which catch a variety of special cases.
Ports will have to choose between using those patterns and exploiting
your work. I'd be tempted to generate a double-word shift by the given
constant and check its cost vs the single word shifts.
What happens if there's an overlap between t_low and low? Won't the
lshift clobber low and thus we get the right value for the rshift in
that case?
Note that expand_variable_shift may not honor your request for putting
the result in the TARGET target parameter you pass in.
Thus:
temp = expand_variable_shift (...)
temp = expand_variable_shift (...)
Can't be right. You probably need something like
temp = expand_variable_shift (...)
if (temp != t_low)
emit_move_insn (t_low, temp);
temp = expand_variable_shift (...)
if (temp != t_high)
emit_move_insn (t_high, temp);
return target;
So I generally like where you're going with this, but I have concerns
about its correctness, particularly in cases where there's an overlap or
when expand_variable_shift returns its value in something other than the
passed in target.
Jeff