On Thu, 2012-07-26 at 10:51 -0700, Ian Lance Taylor wrote:
> On Thu, Jul 26, 2012 at 8:57 AM, Jon Beniston <[email protected]>
> wrote:
> >
> > I'd like to try to optimise double word left shifts of sign/zero extended
> > operands if a widening multiply instruction is available. For the following
> > code:
> >
> > long long f(long a, long b)
> > {
> > return (long long)a << b;
> > }
> >
> > ARM, MIPS etc expand to a fairly long sequence like:
> >
> > nor $3,$0,$5
> > sra $2,$4,31
> > srl $7,$4,1
> > srl $7,$7,$3
> > sll $2,$2,$5
> > andi $6,$5,0x20
> > sll $3,$4,$5
> > or $2,$7,$2
> > movn $2,$3,$6
> > movn $3,$0,$6
> >
> > I'd like to optimise this to something like:
> >
> > (long long) a * (1 << b)
> >
> > Which should just be 3 or so instructions. I don't think this can be
> > sensibly done in the target backend as the generated pattern is too
> > complicated to match and am not familiar with the middle end. Any
> > suggestions as to where and how this should be best implemented?
>
> It seems to me that you could just add an ashldi3 pattern.
>
This is interesting. I've quickly tried it out on the SH port. It can
be accomplished with the combine pass, although there are a few things
that should be taken care of:
- an "extendsidi2" pattern is required (so that the extension is not
performed before expand)
- an "ashldi3" pattern that accepts "reg:DI << reg:DI"
- maybe some adjustments to the costs calculations
(wasn't required in my case)
With those in place, combine will try to match the following pattern
(define_insn_and_split "*"
[(set (match_operand:DI 0 "arith_reg_dest" "=r")
(ashift:DI (sign_extend:DI (match_operand:SI 1 "arith_reg_operand"
"r"))
(sign_extend:DI (match_operand:SI 2 "arith_reg_operand"
"r"))))]
"TARGET_SH2"
"#"
"&& can_create_pseudo_p ()"
[(const_int 0)]
{
rtx tmp = gen_reg_rtx (SImode);
emit_move_insn (tmp, const1_rtx);
emit_insn (gen_ashlsi3 (tmp, tmp, operands[2]));
emit_insn (gen_mulsidi3 (operands[0], tmp, operands[1]));
DONE;
})
which eventually results in the expected output
mov #1,r1 ! 24 movsi_i/3 [length = 2]
shld r5,r1 ! 25 ashlsi3_d [length = 2]
dmuls.l r4,r1 ! 27 mulsidi3_i [length = 2]
sts macl,r0 ! 28 movsi_i/5 [length = 2]
rts ! 35 *return_i [length = 2]
sts mach,r1 ! 29 movsi_i/5 [length = 2]
One potential pitfall might be the handling of a real "reg:DI << reg:DI"
if there are no patterns already there that handle it (as it is the case
for the SH port). If I observed correctly, the "ashldi3" expander must
not FAIL for a "reg:DI << reg:DI" (to do a lib call), or else combine
would not arrive at the pattern above.
Hope this helps.
Cheers,
Oleg