On Thu, 2012-07-26 at 10:51 -0700, Ian Lance Taylor wrote:
> On Thu, Jul 26, 2012 at 8:57 AM, Jon Beniston <jon.benis...@ensilica.com> 
> wrote:
> >
> > I'd like to try to optimise double word left shifts of sign/zero extended
> > operands if a widening multiply instruction is available. For the following
> > code:
> >
> > long long f(long a, long b)
> > {
> >   return (long long)a << b;
> > }
> >
> > ARM, MIPS etc expand to a fairly long sequence like:
> >
> >         nor     $3,$0,$5
> >         sra     $2,$4,31
> >         srl     $7,$4,1
> >         srl     $7,$7,$3
> >         sll     $2,$2,$5
> >         andi    $6,$5,0x20
> >         sll     $3,$4,$5
> >         or      $2,$7,$2
> >         movn    $2,$3,$6
> >         movn    $3,$0,$6
> >
> > I'd like to optimise this to something like:
> >
> >  (long long) a * (1 << b)
> >
> > Which should just be 3 or so instructions. I don't think this can be
> > sensibly done in the target backend as the generated pattern is too
> > complicated to match and am not familiar with the middle end. Any
> > suggestions as to where and how this should be best implemented?
> 
> It seems to me that you could just add an ashldi3 pattern.
> 

This is interesting.  I've quickly tried it out on the SH port.  It can
be accomplished with the combine pass, although there are a few things
that should be taken care of:
- an "extendsidi2" pattern is required (so that the extension is not
  performed before expand)
- an "ashldi3" pattern that accepts "reg:DI << reg:DI"
- maybe some adjustments to the costs calculations
  (wasn't required in my case)

With those in place, combine will try to match the following pattern

(define_insn_and_split "*"
  [(set (match_operand:DI 0 "arith_reg_dest" "=r")
        (ashift:DI (sign_extend:DI (match_operand:SI 1 "arith_reg_operand"
"r"))
                   (sign_extend:DI (match_operand:SI 2 "arith_reg_operand" 
"r"))))]
  "TARGET_SH2"
  "#"
  "&& can_create_pseudo_p ()"
  [(const_int 0)]
{
  rtx tmp = gen_reg_rtx (SImode);
  emit_move_insn (tmp, const1_rtx);
  emit_insn (gen_ashlsi3 (tmp, tmp, operands[2]));
  emit_insn (gen_mulsidi3 (operands[0], tmp, operands[1]));
  DONE;
})

which eventually results in the expected output

        mov     #1,r1   ! 24    movsi_i/3       [length = 2]
        shld    r5,r1   ! 25    ashlsi3_d       [length = 2]
        dmuls.l r4,r1   ! 27    mulsidi3_i      [length = 2]
        sts     macl,r0 ! 28    movsi_i/5       [length = 2]
        rts             ! 35    *return_i       [length = 2]
        sts     mach,r1 ! 29    movsi_i/5       [length = 2]

One potential pitfall might be the handling of a real "reg:DI << reg:DI"
if there are no patterns already there that handle it (as it is the case
for the SH port).  If I observed correctly, the "ashldi3" expander must
not FAIL for a "reg:DI << reg:DI" (to do a lib call), or else combine
would not arrive at the pattern above.

Hope this helps.

Cheers,
Oleg


Reply via email to