https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79059

            Bug ID: 79059
           Summary: Information from CCmode is not propagated across basic
                    blocks
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mkuvyrkov at gcc dot gnu.org
  Target Milestone: ---

This bug report is motivated by a performance regression [1] in 429.mcf on
AArch64, but is relevant to all targets that use CCmode and rely on combine
optimization.  Below description assumes AArch64 ISA.

Sample code:
===========
<BB1>:
...
add w1, w1, w2
cmp w1, #0
b.nz BB1

<BB2>:
ccmp ..., eq    // Some instruction that needs only NZ bits of CC register
cmp w2, w3      // Set CC to something new
b.eq BB1
===========

The high-level issue is that "add" and "cmp" instructions can't be combined
into "adds" in BB1 because reg liveness info at the top of BB2 advertises that
it needs "CC" register [1].  While BB2 really needs only part of CC register
valid (NZ flags), liveness info cannot relate that.  Therefore liveness info
marks all of CC as used thus preventing combining optimization.

I've considered several ways to improve on the situation, but none of them seem
particularly appealing.  I would appreciate improvements and suggestions on
these or other approaches.

#1 Make register liveness info include mode information.

The current state can be viewed as all registers listing their widest mode.  We
can [incrementally] set more precise modes on registers (e.g., CC_REGNUM) when
cases like the above present themselves.  This would be a substantial overall
project, with several milestones each of which is worthy in itself.  I.e.,
phase_1: add mode field, set it conservatively, and verify it is propagated
correctly through dataflow;
phase_2: improve handling of CC modes for the above motivating example;
phase_3: improve handling of modes for non-CC registers when examples present
themselves.

The main advantage of this approach is that it will benefit many architectures
and will improve liveness information for all registers, not just CC_REGNUM. 
The main disadvantage -- it is a big project.

#2 Split CC_REGNUM into separate registers: CC_NZ_REGNUM, CC_CV_REGNUM.

This would require substantial rework of aarch64 backend.  All patterns needs
to audited, some patterns will need to be duplicated.  It might be possible to
reduce pattern duplication by inventing additional iterators in MD files, or
otherwise automating conversion.

This work needs to be done entirely in aarch64 backend, which, IMO, is bad
since other targets do not benefit.

#3 <Something else>

Suggestions and comments are welcome.

[1] The regression occurred after a legitimate patch (IIRC, rev. 232442 by
Kyrill Tkachov) made GCC generate "ccmp" instruction in BB2 instead of starting
BB2 with "cmp w1, #0".

[2] "adds" instruction sets NZ flags just like "cmp" instruction would, but CV
flags are set differently.  Therefore "cmp" can be substituted with "adds" only
when CV flags are unused.

Reply via email to