https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123017

--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:d7aebc728993c1c410ad6d069bd5457e37850cdf

commit r16-6496-gd7aebc728993c1c410ad6d069bd5457e37850cdf
Author: Tamar Christina <[email protected]>
Date:   Mon Jan 5 14:25:41 2026 +0000

    AArch64: Add if-conversion target cost model [PR123017]

    Since g:b219cbeda72d23b7ad6ff12cd159784b7ef00667

    The following

    void f(const int *restrict in,
       int *restrict out,
       int n, int threshold)
    {
      for (int i = 0; i < n; ++i) {
        int v = in[i];
        if (v > threshold) {
          int t = v * 3;
          t += 7;
          t ^= 0x55;
          t *= 0x55;
          t -= 0x5;
          t &= 0xFE;
          t ^= 0x55;
          out[i] = t;
        } else {
          out[i] = v;
        }
      }
    }

    compiled at -O2

    results in aggressive if-conversion which increases the number of dynamic
    instructions and the latency of the loop as it has to wait for t to be
    calculated now in all cases.

    This has led to big performance losses in packages like zstd [1] which in
turns
    affects packaging and LTO speed.

    The default cost model for if-conversion is overly permissive and allows if
    conversions assuming that branches are very expensive.

    This patch implements an if-conversion cost model for AArch64.   AArch64
has a
    number of conditional instructions that need to be accounted for, however
this
    initial version keeps things simple and is only really concerned about
csel.

    The issue specifically with csel is that it may have to wait for two
argument
    to be evaluated before it can be executed.  This means it has a direct
    correlation to increases in dynamic instructions.

    To fix this I add a new tuning parameter that indicates a rough estimation
of
    the branch misprediction cost of a branch.   We then accept if-conversion
while
    the cost of this multiplied by the cost of branches is cheaper.

    There is a basic detection of CINC and CSET because these usually are ok. 
We
    also accept all if-conversion when not inside a loop.  Because CE is not an
RTL
    SSA pass we can't do more extensive checks like checking if the csel is a
loop
    carried dependency.  As such this is a best effort thing and intends to
catch the
    most egregious cases like the above.

    This recovers the ~25% performance loss in zstd decoding and gives better
    results than GCC 14 which was before the regression happened.

    Additionally I've benchmarked on a number of cores all the attached
examples
    and checked various cases.  On average the patch gives an improvement
between
    20-40%.

    [1] https://github.com/facebook/zstd/pull/4418#issuecomment-3004606000

    gcc/ChangeLog:

            PR target/123017
            * config/aarch64/aarch64-json-schema.h: Add br_mispredict_factor.
            * config/aarch64/aarch64-json-tunings-parser-generated.inc
            (parse_branch_costs): Add br_mispredict_factor.
            * config/aarch64/aarch64-json-tunings-printer-generated.inc
            (serialize_branch_costs): Add br_mispredict_factor.
            * config/aarch64/aarch64-protos.h (struct cpu_branch_cost): Add
            br_mispredict_factor.
            * config/aarch64/aarch64.cc (aarch64_max_noce_ifcvt_seq_cost,
            aarch64_noce_conversion_profitable_p,
            TARGET_MAX_NOCE_IFCVT_SEQ_COST,
            TARGET_NOCE_CONVERSION_PROFITABLE_P): New.
            * config/aarch64/tuning_models/generic.h (generic_branch_cost): Add
            br_mispredict_factor.
            * config/aarch64/tuning_models/generic_armv8_a.h: Remove
            generic_armv8_a_branch_cost and use generic_branch_cost.

    gcc/testsuite/ChangeLog:

            PR target/123017
            * gcc.target/aarch64/pr123017_1.c: New test.
            * gcc.target/aarch64/pr123017_2.c: New test.
            * gcc.target/aarch64/pr123017_3.c: New test.
            * gcc.target/aarch64/pr123017_4.c: New test.
            * gcc.target/aarch64/pr123017_5.c: New test.
            * gcc.target/aarch64/pr123017_6.c: New test.
            * gcc.target/aarch64/pr123017_7.c: New test.

Reply via email to