Jeff Law writes: > On 04/16/2015 05:04 AM, Jiong Wang wrote: >> >> This is a rework of >> >> https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01998.html >> >> After second thinking, I feel it's better to fix this in earlier stage >> during RTL expand which is more generic, and we also avoid making the >> already complex combine pass complexer. >> >> Currently gcc expand wide mode left shift to some generic complex >> instruction sequences, while if we have known the high part of wide mode >> all comes from sign extension, the expand logic could be simplifed. >> >> Given the following example, >> >> T A = (T) B << const_imm_shift >> >> We know the high part of A are all comes from sign extension, if >> >> * T is the next wider type of word_mode. >> >> For example, for aarch64, if type T is 128int (TImode), and B is with >> type SImode or DImode, then tree analyzer know that the high part of >> TImode result all comes from sign extension, and kept them in range info. >> >> |< T >| >> | high | low | >> |<- sizel ->| >> >> For above example, we could simplify the expand logic into >> 1. low = low << const_imm_shift; >> 2. high = low >> (sizel - const_imm_shift) */ >> >> We can utilize the arithmetic right shift to do the sign >> extension. Those reduntant instructions will be optimized out later. >> >> For actual .s improvement, >> >> AArch64 >> ======= >> >> __int128_t >> foo (int data) >> { >> return (__int128_t) data << 50; >> } >> >> old: >> sxtw x2, w0 >> asr x1, x2, 63 >> lsl x0, x2, 50 >> lsl x1, x1, 50 >> orr x1, x1, x2, lsr 14 >> >> new: >> sxtw x1, w0 >> lsl x0, x1, 50 >> asr x1, x1, 14 >> >> >> ARM (.fpu softvfp) >> =========== >> >> long long >> shift (int data) >> { >> return (long long) data << 20; >> } >> >> old: >> stmfd sp!, {r4, r5} >> mov r5, r0, asr #31 >> mov r3, r0 >> mov r0, r0, asl #20 >> mov r1, r5, asl #20 >> orr r1, r1, r3, lsr #12 >> ldmfd sp!, {r4, r5} >> bx lr >> >> new: >> mov r1, r0 >> mov r0, r0, asl #20 >> mov r1, r1, asr #12 >> bx lr >> >> Test >> ==== >> >> x86 bootstrap OK, regression test OK. >> AArch64 bootstrap OK, regression test on board OK. >> >> Regards, >> Jiong >> >> 2015-04-116 Jiong.Wang <jiong.w...@arm.com> >> >> gcc/ >> * expr.c (expand_expr_real_2): Take tree range info into account when >> expanding LSHIFT_EXPR. >> >> gcc/testsuite >> * gcc.dg/wide_shift_64_1.c: New testcase. >> * gcc.dg/wide_shift_128_1.c: Ditto. >> * gcc.target/aarch64/ashlti3_1.c: Ditto. >> * gcc.target/arm/ashldisi_1.c: Ditto. > Funny, I find myself wanting this transformation in both places :-) > Expansion time so that we generate efficient code from the start and > combine to catch those cases which are too complex to see at expansion, > but due to other optimizations become visible in the combiner. > > Sadly, it's been fairly common practice for targets to define > double-word shift patterns which catch a variety of special cases. > Ports will have to choose between using those patterns and exploiting > your work. I'd be tempted to generate a double-word shift by the given > constant and check its cost vs the single word shifts. > > What happens if there's an overlap between t_low and low? Won't the > lshift clobber low and thus we get the right value for the rshift in > that case?
Jeff, Sorry, I can't understand the meaning of "overlap between t_low and low", assume "right" in "right value" means the opposite of "left" not "correct". So what you mean is t_low and low share the same pseudo regiser? or you mean if we are shifting a value across the word boundary? like the following. |< double word >| | t_high | t_low | |<- low ->| for above situation, the simplified two instruction sequence do works. "t_low = low << const_imm_shift ; t_high = low >> (sizel - const_imm_shift)" I attached the expand result for a simple testcase below. I appreicate if you could comment on the RTL. Thanks. __int128_t foo (int data) { return (__int128_t) data << 50; } foo.c.188t.optimized === foo (int data) { __int128 _2; __int128 _3; <bb 2>: _2 = (__int128) data_1(D); _3 = _2 << 50; return _3; } foo.c.189r.expand === (insn 2 4 3 2 (set (reg/v:SI 76 [ data ]) (reg:SI 0 x0 [ data ])) foo.c:3 -1 (nil)) (insn 6 3 7 2 (set (reg:DI 79) (sign_extend:DI (reg/v:SI 76 [ data ]))) foo.c:4 -1 (nil)) (insn 7 6 8 2 (set (subreg:DI (reg:TI 78 [ D.2677 ]) 0) (reg:DI 79)) foo.c:4 -1 (nil)) (insn 8 7 9 2 (set (reg:DI 80) (ashiftrt:DI (reg:DI 79) (const_int 63 [0x3f]))) foo.c:4 -1 (nil)) (insn 9 8 10 2 (set (subreg:DI (reg:TI 78 [ D.2677 ]) 8) (reg:DI 80)) foo.c:4 -1 (nil)) ^ ~~~~~~~~~ sign extend SImode "data" into TImode "_2" (r78) (insn 10 9 11 2 (set (subreg:DI (reg:TI 77 [D.2677 ]) 0) (ashift:DI (subreg:DI (reg:TI 78 [ D.2677 ]) 0) (const_int 50 [0x32]))) foo.c:4 -1 (nil)) ^ ~~~~~~~~~~ t_low = low << const_imm_shift, target be r77 (insn 11 10 12 2 (set (subreg:DI (reg:TI 77 [ D.2677 ]) 8) (ashiftrt:DI (subreg:DI (reg:TI 78 [ D.2677 ]) 0) (const_int 14 [0xe]))) foo.c:4 -1 (nil)) ^ ~~~~~~~~~~ t_high = low >> (sizel - const_imm_shift) (insn 12 11 16 2 (set (reg:TI 75 [ <retval> ]) (reg:TI 77 [ D.2677 ])) foo.c:4 -1 (nil)) (insn 16 12 17 2 (set (reg/i:TI 0 x0) (reg:TI 75 [ <retval> ])) foo.c:5 -1 (nil)) > Note that expand_variable_shift may not honor your request for putting > the result in the TARGET target parameter you pass in. Thanks, agree, it's better to add those extra move. I noticed the comments at the start of the function: "Store the result in the rtx TARGET, if that is convenient." Although I still don't understand in which case it's inconveninent. -- Regards, Jiong