Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding

Jiong Wang Mon, 27 Apr 2015 13:58:56 -0700

Jeff Law writes:

> On 04/16/2015 05:04 AM, Jiong Wang wrote:
>>
>> This is a rework of
>>
>>    https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01998.html
>>
>> After second thinking, I feel it's better to fix this in earlier stage
>> during RTL expand which is more generic, and we also avoid making the
>> already complex combine pass complexer.
>>
>> Currently gcc expand wide mode left shift to some generic complex
>> instruction sequences, while if we have known the high part of wide mode
>> all comes from sign extension, the expand logic could be simplifed.
>>
>> Given the following example,
>>
>> T A = (T) B  << const_imm_shift
>>
>> We know the high part of A are all comes from sign extension, if
>>
>> * T is the next wider type of word_mode.
>>
>> For example, for aarch64, if type T is 128int (TImode), and B is with
>> type SImode or DImode, then tree analyzer know that the high part of
>> TImode result all comes from sign extension, and kept them in range info.
>>
>>   |<           T          >|
>>   |   high     |   low     |
>>                |<- sizel ->|
>>
>> For above example, we could simplify the expand logic into
>>   1. low = low << const_imm_shift;
>>   2. high = low >> (sizel - const_imm_shift)  */
>>
>> We can utilize the arithmetic right shift to do the sign
>> extension. Those reduntant instructions will be optimized out later.
>>
>> For actual .s improvement,
>>
>> AArch64
>> =======
>>
>>    __int128_t
>>    foo (int data)
>>    {
>>      return (__int128_t) data << 50;
>>    }
>>
>>    old:
>>      sxtw    x2, w0
>>      asr     x1, x2, 63
>>      lsl     x0, x2, 50
>>      lsl     x1, x1, 50
>>      orr     x1, x1, x2, lsr 14
>>
>>    new:
>>      sxtw    x1, w0
>>      lsl     x0, x1, 50
>>      asr     x1, x1, 14
>>
>>
>> ARM (.fpu softvfp)
>> ===========
>>
>>    long long
>>    shift (int data)
>>    {
>>      return (long long) data << 20;
>>    }
>>
>>    old:
>>      stmfd   sp!, {r4, r5}
>>      mov     r5, r0, asr #31
>>      mov     r3, r0
>>      mov     r0, r0, asl #20
>>      mov     r1, r5, asl #20
>>      orr     r1, r1, r3, lsr #12
>>      ldmfd   sp!, {r4, r5}
>>      bx      lr
>>
>>    new:
>>      mov     r1, r0
>>      mov     r0, r0, asl #20
>>      mov     r1, r1, asr #12
>>      bx      lr
>>
>> Test
>> ====
>>
>>    x86 bootstrap OK, regression test OK.
>>    AArch64 bootstrap OK, regression test on board OK.
>>
>> Regards,
>> Jiong
>>
>> 2015-04-116  Jiong.Wang  <jiong.w...@arm.com>
>>
>> gcc/
>>    * expr.c (expand_expr_real_2): Take tree range info into account when
>>    expanding LSHIFT_EXPR.
>>
>> gcc/testsuite
>>    * gcc.dg/wide_shift_64_1.c: New testcase.
>>    * gcc.dg/wide_shift_128_1.c: Ditto.
>>    * gcc.target/aarch64/ashlti3_1.c: Ditto.
>>    * gcc.target/arm/ashldisi_1.c: Ditto.
> Funny, I find myself wanting this transformation in both places :-) 
> Expansion time so that we generate efficient code from the start and 
> combine to catch those cases which are too complex to see at expansion, 
> but due to other optimizations become visible in the combiner.
>
> Sadly, it's been fairly common practice for targets to define 
> double-word shift patterns which catch a variety of special cases. 
> Ports will have to choose between using those patterns and exploiting 
> your work.   I'd be tempted to generate a double-word shift by the given 
> constant and check its cost vs the single word shifts.
>
> What happens if there's an overlap between t_low and low?  Won't the 
> lshift clobber low and thus we get the right value for the rshift in 
> that case?


Jeff,

  Sorry, I can't understand the meaning of "overlap between t_low and low",
  assume "right" in "right value" means the opposite of "left" not
  "correct".

  So what you mean is t_low and low share the same pseudo regiser?

  or you mean if we are shifting a value across the word boundary? like the 
following.
 
   |<      double word      >|
   |   t_high  |   t_low     |
          |<- low ->|
         
  for above situation, the simplified two instruction sequence do works.
  "t_low = low << const_imm_shift ; t_high = low >> (sizel - const_imm_shift)"

  I attached the expand result for a simple testcase below. I appreicate
  if you could comment on the RTL. 

  Thanks.
  
  __int128_t
  foo (int data)
  {
    return (__int128_t) data << 50;
  }

  foo.c.188t.optimized
  ===
  foo (int data)
  {
    __int128 _2;
    __int128 _3;

  <bb 2>:
    _2 = (__int128) data_1(D);
    _3 = _2 << 50;
    return _3;
  }

  foo.c.189r.expand
  ===
  
  (insn 2 4 3 2 (set (reg/v:SI 76 [ data ])
    (reg:SI 0 x0 [ data ])) foo.c:3 -1
      (nil))
  (insn 6 3 7 2 (set (reg:DI 79)
    (sign_extend:DI (reg/v:SI 76 [ data ]))) foo.c:4 -1
      (nil))
  (insn 7 6 8 2 (set (subreg:DI (reg:TI 78 [ D.2677 ]) 0)
    (reg:DI 79)) foo.c:4 -1
      (nil))
  (insn 8 7 9 2 (set (reg:DI 80)
    (ashiftrt:DI (reg:DI 79) (const_int 63 [0x3f]))) foo.c:4 -1
      (nil))
  (insn 9 8 10 2 (set (subreg:DI (reg:TI 78 [ D.2677 ]) 8)
    (reg:DI 80)) foo.c:4 -1
      (nil))
      
  ^
   ~~~~~~~~~ sign extend SImode "data" into TImode "_2" (r78)
                    
  (insn 10 9 11 2 (set (subreg:DI (reg:TI 77 [D.2677 ]) 0)
    (ashift:DI (subreg:DI (reg:TI 78 [ D.2677 ]) 0)
               (const_int 50 [0x32]))) foo.c:4 -1
      (nil))

  ^
   ~~~~~~~~~~ t_low = low << const_imm_shift, target be r77
  
  (insn 11 10 12 2 (set (subreg:DI (reg:TI 77 [ D.2677 ]) 8)
    (ashiftrt:DI (subreg:DI (reg:TI 78 [ D.2677 ]) 0)
                 (const_int 14 [0xe]))) foo.c:4 -1
      (nil))

  ^
   ~~~~~~~~~~ t_high = low >> (sizel - const_imm_shift)
  
  (insn 12 11 16 2 (set (reg:TI 75 [ <retval> ])
    (reg:TI 77 [ D.2677 ])) foo.c:4 -1
      (nil))
  (insn 16 12 17 2 (set (reg/i:TI 0 x0)
    (reg:TI 75 [ <retval> ])) foo.c:5 -1
      (nil))

> Note that expand_variable_shift may not honor your request for putting 
> the result in the TARGET target parameter you pass in.

Thanks, agree, it's better to add those extra move.

I noticed the comments at the start of the function:

  "Store the result in the rtx TARGET, if that is convenient."

Although I still don't understand in which case it's inconveninent.

-- 
Regards,
Jiong

Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding

Reply via email to