http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423
--- Comment #21 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-07-13 09:36:36 UTC --- (In reply to comment #20) > I like your mem,plus,plus combine that look better than my folding. just there > are 2 little details that I tried while playing with it: there was a > lost of generality with the hardcoding of the shift constant, Yeah, sure. The hardcoded '4' was just to see if/how it could work at all. > and I'm not sure > if there was a risk of clobbering a live operands[1]. For those reasons I > modified it a bit as followed. > > (define_insn_and_split "*movsi_disp" > [(set (match_operand:SI 0 "arith_reg_dest" "=r") > (mem:SI (plus:SI > (plus:SI (mult:SI (match_operand:SI 1 "arith_reg_operand" "r") > (match_operand:SI 2 "const_int_operand" "i")) > (match_operand:SI 3 "arith_reg_operand" "K06")) > (match_operand:SI 4 "const_int_operand" "i"))))] > "TARGET_SH1 && satisfies_constraint_K06 (operands[4]) && exact_log2 (INTVAL > (operands[2])) > 0" > { > gcc_unreachable (); > return "#"; > } > "&& 1" > [(set (match_dup 1) (ashift:SI (match_dup 1) (match_dup 2))) > (set (match_dup 1) (plus:SI (match_dup 1) (match_dup 3))) > (set (match_dup 0) (mem:SI (plus:SI (match_dup 1) (match_dup 4))))] > "{ > int n = exact_log2 (INTVAL (operands[2])); > rtx res = gen_reg_rtx (SImode); > emit_move_insn (res, operands[1]); > > operands[1] = res; > operands[2] = GEN_INT (n); > }" > ) > Uhm, I think it would be enough to do the following in the split preparation... (define_insn_and_split "*movsi_disp" [(set (match_operand:SI 0 "arith_reg_dest" "=r") (mem:SI (plus:SI (plus:SI (mult:SI (match_operand:SI 1 "arith_reg_operand" "r") (match_operand:SI 2 "const_int_operand" "i")) (match_operand:SI 3 "arith_reg_operand" "K06")) (match_operand:SI 4 "const_int_operand" "i"))))] "TARGET_SH1 && satisfies_constraint_K06 (operands[4]) && exact_log2 (INTVAL (operands[2])) > 0" { gcc_unreachable (); return "#"; } "&& can_create_pseudo_p ()" [(set (match_dup 5) (ashift:SI (match_dup 1) (match_dup 2))) (set (match_dup 6) (plus:SI (match_dup 5) (match_dup 3))) (set (match_dup 0) (mem:SI (plus:SI (match_dup 6) (match_dup 4))))] { operands[5] = gen_reg_rtx (SImode); operands[6] = gen_reg_rtx (SImode); operands[2] = GEN_INT (exact_log2 (INTVAL (operands[2]))); }) > (define_insn "ashlsi3_k" > [(set (match_operand:SI 0 "arith_reg_dest" "=r,r,r") > (ashift:SI (match_operand:SI 1 "arith_reg_operand" "0,0,0") > (match_operand:SI 2 "const_int_operand" "M,P27,r")))] > "TARGET_SH1" > "@ > add %0,%0 > shll%O2 %0 > shld %2,%0" > [(set_attr "type" "arith")]) > The shld insn is only available on SH3 and above or SH2A. Allowing arbitrary shifts definitely makes sense here, but then maybe we should use the existing shift-expansion, although it sometimes causes problems (like it ends up generating functions calls for shll8 on SH2..). Another option might be to do these combine patterns on < SH3 only when the resulting shift costs do not pass a certain threshold. > Using those changes, a snipet like > > int > foo4 (long long tab[], int index) > { > return (int)tab [index+4]; > } > > not compiles as: > > mov #3,r1 > shld r1,r5 > add r4,r5 > rts > mov.l @(32,r5),r0 > If I recall correctly, a sequence such as "mov #3,r1; shld r1,r5" will be slower than "shll r5; shll2 r5" due to pipelining issues. But that's another story. > 2)plus_mulsi > > I see some interactions with the movsi_disp pattern, due to different ordering > on the matching in the combiner. I'll need to play more this them activated > together. > > So I think you can go ahead with your combiner movsi_disp pattern and propose > to Kaz when ready to close this defect. Feel free to take or not my > suggestions. > > I'll then go with my plus_mulsi combiner in a second time, making clear that > it's provide additional optimization opportunities without mixing the impacts. > OK, I'll brush up the 'movsi_disp' stuff a little, taking our input into account. It might take a couple of days though. > On a related thread, for further work, I'm thinking on adding support for the > MAC instruction, now that was have the multiply and add. But this requires > exposing the MACLH registers to reload. Had anyone had a thought on this ? I'd > like to give this a try pretty soon. > I have created PR 53949 for this.