https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68796
Bug ID: 68796 Summary: Make use of and-immediate+compare instructions more robust Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Target: aarch64 The aarch64 target (and others) has an instruction to perform a bitmask and compare the result for equality against zero: the TST instruction. In config/aarch64/aarch64.md its pattern is represented as: (define_insn "*and<mode>3nr_compare0" [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ (and:GPI (match_operand:GPI 0 "register_operand" "%r,r") (match_operand:GPI 1 "aarch64_logical_operand" "r,<lconst>")) (const_int 0)))] "" "tst\\t%<w>0, %<w>1" [(set_attr "type" "logics_reg,logics_imm")] ) We should be utilising it more than we are at the moment. Some examples where we should be using it but are not: int f8 (int x) { if (x & 8) return 1; return x; } for -O2 on an aarch64 toolchain we generate: f8: and w1, w0, 8 cmp w1, wzr csinc w0, w0, wzr, eq ret when we could be doing: f8: tst x0, 8 csinc w0, w0, wzr, eq ret Combine tries to match some patterns like: (set (reg:CC 66 cc) (compare:CC (zero_extract:DI (reg:DI 0 x0 [ x ]) (const_int 1 [0x1]) (const_int 3 [0x3])) (const_int 0 [0]))) and (set (reg:CC 66 cc) (compare:CC (and:DI (lshiftrt:DI (reg:DI 0 x0 [ x ]) (const_int 3 [0x3])) (const_int 1 [0x1])) (const_int 0 [0]))) but not the and-immediate case for which we have a pattern. Now, the documentation in md.texi on the topic of canonicalization says this: " Equality comparisons of a group of bits (usually a single bit) with zero will be written using @code{zero_extract} rather than the equivalent @code{and} or @code{sign_extract} operations." So, the aarch64 port is arguably at fault here for not providing the canonical form here. Indeed, adding a pattern that matches something like: [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ (zero_extract:GPI (match_operand:GPI 0 "register_operand" "r") (match_operand:GPI 1 "const_int_operand" "n") (match_operand:GPI 2 "const_int_operand" "n")) (const_int 0)))] and converts operands 1 and 2 into the appropriate bitmask for the TST instruction works (with an appropriate change to the SELECT_CC_MODE implementation for aarch64 to allow the zero_extract case). However, consider the case where we have an immediate that corresponds to the mode mask of a short mode like QImode: int f255 (int x) { if (x & 255) return 1; return x; } Again, we generate: f255: and w1, w0, 255 cmp w1, wzr csinc w0, w0, wzr, eq ret instead of: f255: tst w0, 255 csinc w0, w0, wzr, eq ret even with the zero_extract pattern in place. Here, combine doesn't try to match a zero-extract variant nor an AND-immediate form. It creates a QImode subreg of the operand and tries to perform the comparison in the shortest mode. That is, it tries to match: (set (reg:CC_ZESWP 66 cc) (compare:CC_ZESWP (reg:QI 0 x0 [ x ]) (const_int 0 [0]))) For which we have no pattern. Again, I could add a pattern to aarch64.md to match something like: (define_insn "*and<mode>_compare0" [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ (match_operand:SHORT 0 "register_operand" "r") (const_int 0)))] where SHORT is a mode iterator over QImode and HImode and generate a TST instruction with the mode mask and modify SELECT_CC_MODE appropriately. However, it seems to me that the general and-immediate+compare form of the instruction should be matched more robustly rather than having to specify all the different equivalent forms in the MD files. Making the above changes to the aarch64 backend ends up generating 42% more tst instructions across SPEC2006 and ends up reducing the number of cmp instructions generated by about 9% and the generated code always looks simpler due to us avoiding performing an AND-immediate operation followed by a compare or a compare+branch. Is there any way we can make combine try harder to match these equivalent forms? Or should we go ahead and add these equivalent forms of the and-immediate+compare pattern? I think adding a pattern to handle the zero_extract case is not unreasonable since the canonicalization rules dictate it, but for the second case, perhaps we should not be trying short subregs for which there is no native instruction