https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79059
Bug ID: 79059 Summary: Information from CCmode is not propagated across basic blocks Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: mkuvyrkov at gcc dot gnu.org Target Milestone: --- This bug report is motivated by a performance regression [1] in 429.mcf on AArch64, but is relevant to all targets that use CCmode and rely on combine optimization. Below description assumes AArch64 ISA. Sample code: =========== <BB1>: ... add w1, w1, w2 cmp w1, #0 b.nz BB1 <BB2>: ccmp ..., eq // Some instruction that needs only NZ bits of CC register cmp w2, w3 // Set CC to something new b.eq BB1 =========== The high-level issue is that "add" and "cmp" instructions can't be combined into "adds" in BB1 because reg liveness info at the top of BB2 advertises that it needs "CC" register [1]. While BB2 really needs only part of CC register valid (NZ flags), liveness info cannot relate that. Therefore liveness info marks all of CC as used thus preventing combining optimization. I've considered several ways to improve on the situation, but none of them seem particularly appealing. I would appreciate improvements and suggestions on these or other approaches. #1 Make register liveness info include mode information. The current state can be viewed as all registers listing their widest mode. We can [incrementally] set more precise modes on registers (e.g., CC_REGNUM) when cases like the above present themselves. This would be a substantial overall project, with several milestones each of which is worthy in itself. I.e., phase_1: add mode field, set it conservatively, and verify it is propagated correctly through dataflow; phase_2: improve handling of CC modes for the above motivating example; phase_3: improve handling of modes for non-CC registers when examples present themselves. The main advantage of this approach is that it will benefit many architectures and will improve liveness information for all registers, not just CC_REGNUM. The main disadvantage -- it is a big project. #2 Split CC_REGNUM into separate registers: CC_NZ_REGNUM, CC_CV_REGNUM. This would require substantial rework of aarch64 backend. All patterns needs to audited, some patterns will need to be duplicated. It might be possible to reduce pattern duplication by inventing additional iterators in MD files, or otherwise automating conversion. This work needs to be done entirely in aarch64 backend, which, IMO, is bad since other targets do not benefit. #3 <Something else> Suggestions and comments are welcome. [1] The regression occurred after a legitimate patch (IIRC, rev. 232442 by Kyrill Tkachov) made GCC generate "ccmp" instruction in BB2 instead of starting BB2 with "cmp w1, #0". [2] "adds" instruction sets NZ flags just like "cmp" instruction would, but CV flags are set differently. Therefore "cmp" can be substituted with "adds" only when CV flags are unused.