On Mon, Nov 25, 2024 at 12:10 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Mon, Nov 25, 2024 at 11:52:31AM +0100, Uros Bizjak wrote: > > > Any reason for an exact comparison rather than > > > && (INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT - 1)) == 0 > > > ? > > > I mean, we can optimize this way 1U << (32 - x) or > > > 1U << (1504 - x) or any other multiply of 32. > > > > Count values outside of [0, bitwidth) are undefined in general. Also, > > Sure, e.g. the 1504 - x case will be undefined if x is not in [1473, 1504] > But it very well could be in that range. > > What I'm arguing about is that changing the test (in all the new patterns) > will not noticeably slow compilation down and be more general. > > > during the bootstrap only values of 32 or 64 were found, so I thought > > to not complicate the condition too much > > > > FYI, the conversion triggers 504 times during the bootstrap, I somehow > > expected a lower number. > > > > > Similarly, we can optimize 1U << (32 + x) to 1U << x and > > > again do that for any other multiplies of 32. > > > > I don't think anybody uses the above idiom, it is valid only for > > certain targets (x86 and the ones that mask count argument), but > > undefined in general. > > No, it is again valid whenever x is in [-32, -1] range, which it very well > could be (or if x is unsigned, in [-32U, -1U] range). > Of course if it is rare, adding new patterns for it might not be worth it.
Huh, I was wrong. There are 356 cases where 1U << (32 + x) to 1U << x optimization triggers during the bootstrap. I am testing the attached patch. Thanks, Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index df78e4df9d8..788d2555a1a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -15896,7 +15896,53 @@ (define_insn_and_split "*ashl<mode>3_mask_1" "" [(set_attr "isa" "*,bmi2")]) -(define_insn_and_split "*ashl<mode>3_negcnt" +(define_insn_and_split "*ashl<mode>3_add" + [(set (match_operand:SWI48 0 "nonimmediate_operand") + (ashift:SWI48 + (match_operand:SWI48 1 "nonimmediate_operand") + (subreg:QI + (plus + (match_operand 2 "int248_register_operand" "c,r") + (match_operand 3 "const_int_operand")) 0))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands) + && (INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT - 1)) == 0 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel + [(set (match_dup 0) + (ashift:SWI48 (match_dup 1) + (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); + operands[2] = gen_lowpart (QImode, operands[2]); +} + [(set_attr "isa" "*,bmi2")]) + +(define_insn_and_split "*ashl<mode>3_add_1" + [(set (match_operand:SWI48 0 "nonimmediate_operand") + (ashift:SWI48 + (match_operand:SWI48 1 "nonimmediate_operand") + (plus:QI + (match_operand:QI 2 "register_operand" "c,r") + (match_operand:QI 3 "const_int_operand")))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands) + && (INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT - 1)) == 0 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel + [(set (match_dup 0) + (ashift:SWI48 (match_dup 1) + (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] + "" + [(set_attr "isa" "*,bmi2")]) + +(define_insn_and_split "*ashl<mode>3_sub" [(set (match_operand:SWI48 0 "nonimmediate_operand") (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand") @@ -15927,7 +15973,7 @@ (define_insn_and_split "*ashl<mode>3_negcnt" } [(set_attr "isa" "*,bmi2")]) -(define_insn_and_split "*ashl<mode>3_negcnt_1" +(define_insn_and_split "*ashl<mode>3_sub_1" [(set (match_operand:SWI48 0 "nonimmediate_operand") (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand") @@ -16678,7 +16724,53 @@ (define_insn_and_split "*<insn><mode>3_mask_1" "" [(set_attr "isa" "*,bmi2")]) -(define_insn_and_split "*<insn><mode>3_negcnt" +(define_insn_and_split "*<insn><mode>3_add" + [(set (match_operand:SWI48 0 "nonimmediate_operand") + (any_shiftrt:SWI48 + (match_operand:SWI48 1 "nonimmediate_operand") + (subreg:QI + (plus + (match_operand 2 "int248_register_operand" "c,r") + (match_operand 3 "const_int_operand")) 0))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands) + && (INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT - 1)) == 0 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel + [(set (match_dup 0) + (any_shiftrt:SWI48 (match_dup 1) + (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[2] = force_reg (GET_MODE (operands[2]), operands[2]); + operands[2] = gen_lowpart (QImode, operands[2]); +} + [(set_attr "isa" "*,bmi2")]) + +(define_insn_and_split "*<insn><mode>3_add_1" + [(set (match_operand:SWI48 0 "nonimmediate_operand") + (any_shiftrt:SWI48 + (match_operand:SWI48 1 "nonimmediate_operand") + (plus:QI + (match_operand:QI 2 "register_operand" "c,r") + (match_operand:QI 3 "const_int_operand")))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) + && (INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT - 1)) == 0 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel + [(set (match_dup 0) + (any_shiftrt:SWI48 (match_dup 1) + (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] + "" + [(set_attr "isa" "*,bmi2")]) + +(define_insn_and_split "*<insn><mode>3_sub" [(set (match_operand:SWI48 0 "nonimmediate_operand") (any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand") @@ -16709,7 +16801,7 @@ (define_insn_and_split "*<insn><mode>3_negcnt" } [(set_attr "isa" "*,bmi2")]) -(define_insn_and_split "*<insn><mode>3_negcnt_1" +(define_insn_and_split "*<insn><mode>3_sub_1" [(set (match_operand:SWI48 0 "nonimmediate_operand") (any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")