https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103990

            Bug ID: 103990
           Summary: 541.leela_r slower by 4.5-6% with PGO+LTO -Ofast
                    -march=native in the first week of January 2022
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

LNT reports that 541.leela_r from SPEC 2017 intrate suite regressed
when compiled with both PGO and LTO with -Ofast -march=native on all
machines in the first week of January:

zen3: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=477.397.0
zen2: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=286.397.0
zen1: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=17.397.0
kaby: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=16.397.0

On my zen2 desktop I have bisected the regression, or at least most of
it, to  r12-6208-gebc853deb7cc04:

  ebc853deb7cc0487de9ef6e891a007ba853d1933 is the first bad commit
  commit ebc853deb7cc0487de9ef6e891a007ba853d1933
  Author: Richard Biener <rguent...@suse.de>
  Date:   Tue Jan 4 11:59:35 2022 +0100

    tree-optimization/103690 - not up-to-date SSA and PRE DCE

    This avoids running simple_dce_from_worklist on partially not up-to-date
    SSA form (in unreachable code regions) by scheduling CFG cleanup
    manually as is done anyway when tail-merging runs.

    2022-01-04  Richard Biener  <rguent...@suse.de>

            PR tree-optimization/103690
            * tree-pass.h (tail_merge_optimize): Adjust.
            * tree-ssa-tail-merge.c (tail_merge_optimize): Pass in whether
            to re-split critical edges, move CFG cleanup ...
            * tree-ssa-pre.c (pass_pre::execute): ... here, before
            simple_dce_from_worklist and delay freeing inserted_exprs from
            ...
            (fini_pre): .. here.

Reply via email to