Hi! [ Please CC: me on rs6000 patches. Thanks! ]
On Tue, Sep 29, 2020 at 12:26:28PM +0200, Eric Botcazou wrote: > and the masking is present all the way down to the assembly at -O2: > > rlwinm 4,4,0,27,31 > rotlw 3,3,4 > > Now this masking is redundant since it's done by the hardware so it would be > nice to get rid of it. I have attached a couple of patches to that effect: > the first one adds new instructions while the second one only adds splitters. > * config/rs6000/rs6000.md (*rotl<mode>3_mask): New. > (*rotlsi3_mask_64): Likewise. > (*rotl<mode>3_dot): Change to use P mode iterator. > (*rotl<mode>3_mask_dot): New. > (*rotl<mode>3_dot2): Change to use P mode iterator. > (*rotl<mode>3_mask_dot2): New. Don't call it "mask" please: *all* of our basic rotate instructions already have something called "mask" (that is the "m" in "rlwnm" for example; and "rotlw d,a,b" is just an extended mnemonic for "rlwnm d,a,b,0,31"). The hardware also does not mask the shift amount at all (instead, only the low 5 bits of RB *are* the shift amount). > +;; Avoid useless masking of count operand (Sentences end in a full stop.) > +(define_insn "*rotl<mode>3_mask" > + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > + (rotate:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") > + (and:GPR (match_operand:GPR 2 "gpc_reg_operand" "r") > + (match_operand:GPR 3 "const_int_operand" "n"))))] > + "(UINTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode) - 1)) > + == (unsigned HOST_WIDE_INT) (GET_MODE_BITSIZE (<MODE>mode) - 1)" (Useless casts are useless.) Don't mask operands[3] please (in the UINTVAL): RTL with the number outside of that range is *undefined*. So just check that it is equal? > (define_insn_and_split "*rotl<mode>3_dot" > [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y") > - (compare:CC (rotate:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") > - (match_operand:SI 2 "reg_or_cint_operand" > "rn,rn")) > + (compare:CC (rotate:P (match_operand:P 1 "gpc_reg_operand" "r,r") > + (match_operand:SI 2 "reg_or_cint_operand" > "rn,rn")) > (const_int 0))) > - (clobber (match_scratch:GPR 0 "=r,r"))] > - "<MODE>mode == Pmode" > + (clobber (match_scratch:P 0 "=r,r"))] > + "" > "@ > rotl<wd>%I2. %0,%1,%<hH>2 > #" > - "&& reload_completed && cc_reg_not_cr0_operand (operands[3], CCmode)" > + "reload_completed && cc_reg_not_cr0_operand (operands[3], CCmode)" > [(set (match_dup 0) > - (rotate:GPR (match_dup 1) > - (match_dup 2))) > + (rotate:P (match_dup 1) > + (match_dup 2))) > (set (match_dup 3) > (compare:CC (match_dup 0) > (const_int 0)))] Why? This diverges the "dot" version from the non-dot for no reason. (We can do 32-bit rotates on 64-bit implementations just fine, and even the record ("dot") form works just fine; except for the setting of "smaller than" in CR0. And we can fix that even (by not using 0,31 but a wrapping mask, say, 1,0), but more readable generated code was more important so far.) I don't see a patch with splitters only? Huh. Did you attach the same patch twice? Since this won't be handled before combine (or what do I miss?), it is fine to do splitters only (splitters for combine). But the other approach is fine as well. Thanks, Segher