On Sun, Nov 24, 2024 at 10:02:22PM +0100, Uros Bizjak wrote: > PR target/36503 > > gcc/ChangeLog: > > * config/i386/i386.md (*ashl<mode>3_negcnt): > New define_insn_and_split pattern. > (*ashl<mode>3_negcnt_1): Ditto. > (*<insn><mode>3_negcnt): Ditto. > (*<insn><mode>3_negcnt_1): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr36503-1.c: New test. > * gcc.target/i386/pr36503-2.c: New test.
> +(define_insn_and_split "*ashl<mode>3_negcnt" > + [(set (match_operand:SWI48 0 "nonimmediate_operand") > + (ashift:SWI48 > + (match_operand:SWI48 1 "nonimmediate_operand") > + (subreg:QI > + (minus > + (match_operand 3 "const_int_operand") > + (match_operand 2 "int248_register_operand" "c,r")) 0))) > + (clobber (reg:CC FLAGS_REG))] > + "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands) > + && INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT Any reason for an exact comparison rather than && (INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT - 1)) == 0 ? I mean, we can optimize this way 1U << (32 - x) or 1U << (1504 - x) or any other multiply of 32. Similarly, we can optimize 1U << (32 + x) to 1U << x and again do that for any other multiplies of 32. Jakub