https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113900

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-02-13
           Keywords|compile-time-hog            |
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
What does -march=native resolve to?  I suppose znver2?  I can confirm the
compile-time-hog even with a release checking GCC 13 compiler, but nothing
really stands out here besides maybe RTL combine and load CSE after reload
(that's a usual suspect).

> gcc-13 slarith.i -S -m32 -mfpmath=sse -O3 -fPIC -march=znver2 
> -fno-strict-aliasing -Waddress -Warray-bounds -Wfree-nonheap-object 
> -Wint-to-pointer-cast -Wmain -Wnonnull -Wodr -Wreturn-type 
> -Wsizeof-pointer-memaccess -Wstrict-aliasing -Wstring-compare -Wuninitialized 
> -Wvarargs -ftime-report

Time variable                                   usr           sys          wall
          GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
 2042k (  0%)
 phase parsing                      :   0.13 (  0%)   0.40 ( 20%)   0.53 (  1%)
   25M (  1%)
 phase lang. deferred               :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   96  (  0%)
 phase opt and generate             :  46.65 (100%)   1.61 ( 80%)  48.27 ( 99%)
 2563M ( 99%)
 garbage collection                 :   0.12 (  0%)   0.01 (  0%)   0.12 (  0%)
    0  (  0%)
 dump files                         :   0.03 (  0%)   0.00 (  0%)   0.05 (  0%)
    0  (  0%)
 callgraph construction             :   0.05 (  0%)   0.00 (  0%)   0.01 (  0%)
  552k (  0%)
 callgraph optimization             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 2952  (  0%)
 callgraph functions expansion      :  45.66 ( 98%)   1.46 ( 73%)  47.13 ( 97%)
 2459M ( 95%)
 callgraph ipa passes               :   0.90 (  2%)   0.15 (  7%)   1.06 (  2%)
   60M (  2%)
 ipa function summary               :   0.09 (  0%)   0.00 (  0%)   0.09 (  0%)
 9208k (  0%)
 ipa cp                             :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  175k (  0%)
 ipa inlining heuristics            :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   68k (  0%)
 ipa function splitting             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 8528  (  0%)
 ipa pure const                     :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)
 3504  (  0%)
 ipa icf                            :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
   30k (  0%)
 ipa SRA                            :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   37k (  0%)
 ipa modref                         :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  325k (  0%)
 cfg construction                   :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
 3443k (  0%)
 cfg cleanup                        :   0.52 (  1%)   0.01 (  0%)   0.44 (  1%)
   37M (  1%)
 trivially dead code                :   0.11 (  0%)   0.00 (  0%)   0.15 (  0%)
    0  (  0%)
 df scan insns                      :   0.07 (  0%)   0.00 (  0%)   0.10 (  0%)
   12k (  0%)
 df reaching defs                   :   0.37 (  1%)   0.01 (  0%)   0.29 (  1%)
    0  (  0%)
 df live regs                       :   1.22 (  3%)   0.01 (  0%)   1.15 (  2%)
    0  (  0%)
 df live&initialized regs           :   0.53 (  1%)   0.00 (  0%)   0.65 (  1%)
    0  (  0%)
 df must-initialized regs           :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 df use-def / def-use chains        :   0.07 (  0%)   0.00 (  0%)   0.09 (  0%)
    0  (  0%)
 df live reg subwords               :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 df reg dead/unused notes           :   0.55 (  1%)   0.00 (  0%)   0.51 (  1%)
   24M (  1%)
 register information               :   0.09 (  0%)   0.00 (  0%)   0.09 (  0%)
    0  (  0%)
 alias analysis                     :   0.51 (  1%)   0.00 (  0%)   0.48 (  1%)
  125M (  5%)
 alias stmt walking                 :   0.91 (  2%)   0.22 ( 11%)   0.95 (  2%)
   45M (  2%)
 register scan                      :   0.06 (  0%)   0.00 (  0%)   0.04 (  0%)
 1524k (  0%)
 rebuild jump labels                :   0.09 (  0%)   0.00 (  0%)   0.04 (  0%)
  264  (  0%)
 preprocessing                      :   0.03 (  0%)   0.10 (  5%)   0.12 (  0%)
  500k (  0%)
 lexical analysis                   :   0.06 (  0%)   0.19 (  9%)   0.20 (  0%)
    0  (  0%)
 parser (global)                    :   0.00 (  0%)   0.01 (  0%)   0.01 (  0%)
 3313k (  0%)
 parser struct body                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  165k (  0%)
 parser function body               :   0.04 (  0%)   0.10 (  5%)   0.18 (  0%)
   20M (  1%)
 parser inl. func. body             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  374k (  0%)
 inline parameters                  :   0.04 (  0%)   0.02 (  1%)   0.09 (  0%)
  779k (  0%)
 integration                        :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  382k (  0%)
 tree gimplify                      :   0.03 (  0%)   0.00 (  0%)   0.06 (  0%)
   26M (  1%)
 tree CFG construction              :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
   14M (  1%)
 tree CFG cleanup                   :   0.32 (  1%)   0.03 (  1%)   0.28 (  1%)
 1884k (  0%)
 tree tail merge                    :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
 3359k (  0%)
 tree VRP                           :   1.50 (  3%)   0.09 (  4%)   1.55 (  3%)
   26M (  1%)
 tree Early VRP                     :   0.22 (  0%)   0.00 (  0%)   0.20 (  0%)
   13M (  1%)
 tree copy propagation              :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
  365k (  0%)
 tree PTA                           :   0.08 (  0%)   0.02 (  1%)   0.10 (  0%)
 2480k (  0%)
 tree SSA rewrite                   :   0.01 (  0%)   0.03 (  1%)   0.03 (  0%)
 6370k (  0%)
 tree SSA incremental               :   0.24 (  1%)   0.02 (  1%)   0.43 (  1%)
   28M (  1%)
 tree operand scan                  :   0.44 (  1%)   0.16 (  8%)   0.64 (  1%)
   57M (  2%)
 dominator optimization             :   1.32 (  3%)   0.03 (  1%)   1.34 (  3%)
   25M (  1%)
 backwards jump threading           :   0.81 (  2%)   0.04 (  2%)   0.84 (  2%)
  923k (  0%)
 tree CCP                           :   0.62 (  1%)   0.01 (  0%)   0.54 (  1%)
 1512k (  0%)
 tree split crit edges              :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 8949k (  0%)
 tree reassociation                 :   0.07 (  0%)   0.00 (  0%)   0.04 (  0%)
  107k (  0%)
 tree PRE                           :   0.22 (  0%)   0.00 (  0%)   0.29 (  1%)
   10M (  0%)
 tree FRE                           :   0.74 (  2%)   0.15 (  7%)   0.89 (  2%)
 5140k (  0%)
 tree RPO VN                        :   0.23 (  0%)   0.06 (  3%)   0.33 (  1%)
 3884k (  0%)
 tree code sinking                  :   0.09 (  0%)   0.02 (  1%)   0.03 (  0%)
   11M (  0%)
 tree linearize phis                :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 1003k (  0%)
 tree backward propagate            :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 tree forward propagate             :   0.10 (  0%)   0.02 (  1%)   0.19 (  0%)
 7716k (  0%)
 tree conservative DCE              :   0.05 (  0%)   0.03 (  1%)   0.11 (  0%)
 4177k (  0%)
 tree aggressive DCE                :   0.05 (  0%)   0.01 (  0%)   0.07 (  0%)
 1184k (  0%)
 tree DSE                           :   0.07 (  0%)   0.00 (  0%)   0.09 (  0%)
 1168k (  0%)
 PHI merge                          :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
 1057k (  0%)
 tree loop invariant motion         :   0.06 (  0%)   0.00 (  0%)   0.07 (  0%)
    0  (  0%)
 tree loop interchange              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 1476k (  0%)
 tree canonical iv                  :   0.05 (  0%)   0.00 (  0%)   0.10 (  0%)
   10M (  0%)
 scev constant prop                 :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)
 3211k (  0%)
 tree loop unswitching              :   0.05 (  0%)   0.00 (  0%)   0.08 (  0%)
   11M (  0%)
 loop splitting                     :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
  886k (  0%)
 complete unrolling                 :   0.70 (  1%)   0.09 (  4%)   0.86 (  2%)
  108M (  4%)
 tree vectorization                 :   0.81 (  2%)   0.22 ( 11%)   0.95 (  2%)
  261M ( 10%)
 tree slp vectorization             :   0.58 (  1%)   0.01 (  0%)   0.56 (  1%)
  242M (  9%)
 tree loop distribution             :   0.09 (  0%)   0.00 (  0%)   0.07 (  0%)
 9740k (  0%)
 tree iv optimization               :   0.46 (  1%)   0.05 (  2%)   0.55 (  1%)
   69M (  3%)
 predictive commoning               :   0.52 (  1%)   0.01 (  0%)   0.52 (  1%)
   25M (  1%)
 tree copy headers                  :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
 4543k (  0%)
 tree SSA uncprop                   :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 tree switch lowering               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  188k (  0%)
 gimple CSE sin/cos                 :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 gimple widening/fma detection      :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 tree strlen optimization           :   0.15 (  0%)   0.01 (  0%)   0.18 (  0%)
 1363k (  0%)
 tree modref                        :   0.04 (  0%)   0.00 (  0%)   0.06 (  0%)
  367k (  0%)
 dominance frontiers                :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 dominance computation              :   0.24 (  1%)   0.00 (  0%)   0.25 (  1%)
    0  (  0%)
 out of ssa                         :   0.10 (  0%)   0.01 (  0%)   0.13 (  0%)
  514k (  0%)
 expand vars                        :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)
   29M (  1%)
 expand                             :   0.34 (  1%)   0.01 (  0%)   0.32 (  1%)
  149M (  6%)
 post expand cleanups               :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 6798k (  0%)
 varconst                           :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 1944  (  0%)
 lower subreg                       :   0.03 (  0%)   0.00 (  0%)   0.11 (  0%)
  134k (  0%)
 jump                               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 forward prop                       :   0.40 (  1%)   0.01 (  0%)   0.36 (  1%)
 8490k (  0%)
 CSE                                :   0.99 (  2%)   0.02 (  1%)   1.06 (  2%)
 5692k (  0%)
 dead code elimination              :   0.15 (  0%)   0.00 (  0%)   0.17 (  0%)
    0  (  0%)
 dead store elim1                   :   0.21 (  0%)   0.00 (  0%)   0.23 (  0%)
   27M (  1%)
 dead store elim2                   :   0.24 (  1%)   0.00 (  0%)   0.30 (  1%)
   32M (  1%)
 loop analysis                      :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 loop init                          :   0.15 (  0%)   0.02 (  1%)   0.21 (  0%)
   54M (  2%)
 loop invariant motion              :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
   71k (  0%)
 loop unrolling                     :   0.03 (  0%)   0.01 (  0%)   0.08 (  0%)
 6644k (  0%)
 loop fini                          :   0.02 (  0%)   0.01 (  0%)   0.01 (  0%)
  104k (  0%)
 CPROP                              :   0.51 (  1%)   0.00 (  0%)   0.47 (  1%)
   49M (  2%)
 PRE                                :   3.47 (  7%)   0.00 (  0%)   3.41 (  7%)
 2326k (  0%)
 CSE 2                              :   0.51 (  1%)   0.00 (  0%)   0.47 (  1%)
 2598k (  0%)
 branch prediction                  :   0.08 (  0%)   0.00 (  0%)   0.08 (  0%)
   13M (  1%)
 combiner                           :   7.83 ( 17%)   0.00 (  0%)   7.81 ( 16%)
  475M ( 18%)
 if-conversion                      :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
 1297k (  0%)
 integrated RA                      :   1.92 (  4%)   0.02 (  1%)   1.95 (  4%)
  232M (  9%)
 LRA non-specific                   :   1.86 (  4%)   0.00 (  0%)   1.84 (  4%)
   54M (  2%)
 LRA virtuals elimination           :   0.08 (  0%)   0.00 (  0%)   0.12 (  0%)
 7192k (  0%)
 LRA reload inheritance             :   0.39 (  1%)   0.00 (  0%)   0.30 (  1%)
   22M (  1%)
 LRA create live ranges             :   1.58 (  3%)   0.00 (  0%)   1.68 (  3%)
   11M (  0%)
 LRA hard reg assignment            :   1.51 (  3%)   0.01 (  0%)   1.58 (  3%)
    0  (  0%)
 LRA coalesce pseudo regs           :   0.07 (  0%)   0.00 (  0%)   0.06 (  0%)
    0  (  0%)
 LRA rematerialization              :   0.34 (  1%)   0.00 (  0%)   0.31 (  1%)
  200k (  0%)
 reload                             :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
 6240  (  0%)
 reload CSE regs                    :   1.17 (  3%)   0.01 (  0%)   1.16 (  2%)
   45M (  2%)
 load CSE after reload              :   3.04 (  6%)   0.00 (  0%)   3.13 (  6%)
 3435k (  0%)
 ree                                :   0.03 (  0%)   0.00 (  0%)   0.05 (  0%)
  319k (  0%)
 thread pro- & epilogue             :   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)
  739k (  0%)
 if-conversion 2                    :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
  159k (  0%)
 split paths                        :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
  336k (  0%)
 combine stack adjustments          :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
 5912  (  0%)
 peephole 2                         :   0.08 (  0%)   0.00 (  0%)   0.09 (  0%)
 2472k (  0%)
 hard reg cprop                     :   0.14 (  0%)   0.00 (  0%)   0.15 (  0%)
  501k (  0%)
 scheduling 2                       :   1.15 (  2%)   0.00 (  0%)   1.13 (  2%)
 6414k (  0%)
 machine dep reorg                  :   0.11 (  0%)   0.00 (  0%)   0.17 (  0%)
 6520k (  0%)
 reorder blocks                     :   0.07 (  0%)   0.00 (  0%)   0.03 (  0%)
 5676k (  0%)
 shorten branches                   :   0.08 (  0%)   0.01 (  0%)   0.16 (  0%)
  113k (  0%)
 reg stack                          :   0.11 (  0%)   0.00 (  0%)   0.08 (  0%)
  200k (  0%)
 final                              :   0.27 (  1%)   0.02 (  1%)   0.20 (  0%)
 9063k (  0%)
 tree if-combine                    :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
  267k (  0%)
 straight-line strength reduction   :   0.06 (  0%)   0.01 (  0%)   0.07 (  0%)
  502k (  0%)
 tree loop if-conversion            :   0.04 (  0%)   0.00 (  0%)   0.05 (  0%)
   10M (  0%)
 access analysis                    :   0.08 (  0%)   0.03 (  1%)   0.05 (  0%)
 2856  (  0%)
 rest of compilation                :   0.32 (  1%)   0.02 (  1%)   0.32 (  1%)
   12M (  0%)
 remove unused locals               :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 address taken                      :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 repair loop structures             :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
  936  (  0%)
 TOTAL                              :  46.78          2.01         48.81       
 2591M

I can confirm the ICE with trunk.  That's

  /* We should not have to update virtual SSA form here but some
     transforms involve creating new virtual definitions which makes
     updating difficult.
     We delay the actual update to the end of the pass but avoid
     confusing ourselves by forcing need_ssa_update_p () to false.  */
  unsigned todo = 0; 
  if (need_ssa_update_p (cfun))
    { 
      gcc_assert (loop_vinfo->any_known_not_updated_vssa);

and this is a new "feature", doing less SSA updates.  It will eventually
get bisected to the commit introducing this.

Testcase reduction should be priority here.

Reply via email to