https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91817
Bug ID: 91817 Summary: compile with -O3 is more-than-expectedly slower than -O2 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hehaochen at hotmail dot com Target Milestone: --- Created attachment 46898 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46898&action=edit this test case is from gcc-45364 ### This is normal in GCC 5.3.1(Docker version): ### root@ubuntu:/home/jxl# time gcc -m32 -o1 -g -c 1.i -o test.o real 0m1.017s user 0m0.876s sys 0m0.072s root@ubuntu:/home/jxl# time gcc -m32 -o2 -g -c 1.i -o test.o real 0m1.250s user 0m0.864s sys 0m0.064s root@ubuntu:/home/jxl# time gcc -m32 -o3 -g -c 1.i -o test.o real 0m4.446s user 0m0.700s sys 0m0.476s ### HOWEVER, it is extremely slow in GCC 4.6(Docker version): ### root@4beb8027e1fb:/# time gcc -m32 -o1 -g -c 1.i -o test.o real 0m3.066s user 0m0.656s sys 0m0.772s root@4beb8027e1fb:/# time gcc -m32 -o2 -g -c 1.i -o test.o real 0m1.112s user 0m0.796s sys 0m0.156s root@4beb8027e1fb:/# time gcc -m32 -O3 -g -c 1.i -o test.o real 2m55.363s user 2m41.224s sys 0m1.908s ###### Root cause locates in "var-tracking dataflow" ###### root@4beb8027e1fb:/# gcc -m32 -O3 -ftime-report -g -c 1.i -o test.o Execution times (seconds) callgraph construction: 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 1432 kB ( 1%) ggc callgraph optimization: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 137 kB ( 0%) ggc ipa function splitting: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 528 kB ( 1%) ggc ipa pure const : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 25 kB ( 0%) ggc ipa SRA : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 405 kB ( 0%) ggc cfg cleanup : 0.08 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 549 kB ( 1%) ggc trivially dead code : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc df scan insns : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall 6 kB ( 0%) ggc df multiple defs : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc df reaching defs : 0.12 ( 0%) usr 0.02 ( 1%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc df live regs : 0.35 ( 0%) usr 0.01 ( 0%) sys 0.35 ( 0%) wall 0 kB ( 0%) ggc df live&initialized regs: 0.20 ( 0%) usr 0.01 ( 0%) sys 0.25 ( 0%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 0.04 ( 0%) usr 0.02 ( 1%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc df live reg subwords : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 0.10 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 585 kB ( 1%) ggc register information : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc alias analysis : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 2860 kB ( 3%) ggc alias stmt walking : 0.11 ( 0%) usr 0.06 ( 2%) sys 0.17 ( 0%) wall 8 kB ( 0%) ggc register scan : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 9 kB ( 0%) ggc rebuild jump labels : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.07 ( 0%) usr 0.35 (14%) sys 0.45 ( 0%) wall 3387 kB ( 3%) ggc lexical analysis : 0.19 ( 0%) usr 0.57 (23%) sys 0.74 ( 0%) wall 0 kB ( 0%) ggc parser : 0.13 ( 0%) usr 0.47 (19%) sys 0.72 ( 0%) wall 18502 kB (18%) ggc inline heuristics : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 133 kB ( 0%) ggc integration : 0.07 ( 0%) usr 0.05 ( 2%) sys 0.14 ( 0%) wall 8467 kB ( 8%) ggc tree gimplify : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 4760 kB ( 5%) ggc tree CFG construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1460 kB ( 1%) ggc tree CFG cleanup : 0.04 ( 0%) usr 0.02 ( 1%) sys 0.11 ( 0%) wall 318 kB ( 0%) ggc tree VRP : 0.10 ( 0%) usr 0.04 ( 2%) sys 0.14 ( 0%) wall 2898 kB ( 3%) ggc tree copy propagation : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 197 kB ( 0%) ggc tree PTA : 0.07 ( 0%) usr 0.02 ( 1%) sys 0.06 ( 0%) wall 582 kB ( 1%) ggc tree SSA rewrite : 0.05 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall 1884 kB ( 2%) ggc tree SSA other : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 17 kB ( 0%) ggc tree SSA incremental : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 706 kB ( 1%) ggc tree operand scan : 0.05 ( 0%) usr 0.09 ( 4%) sys 0.13 ( 0%) wall 4319 kB ( 4%) ggc dominator optimization: 0.05 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 1525 kB ( 1%) ggc tree CCP : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) wall 321 kB ( 0%) ggc tree split crit edges : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 589 kB ( 1%) ggc tree reassociation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 194 kB ( 0%) ggc tree PRE : 0.06 ( 0%) usr 0.05 ( 2%) sys 0.10 ( 0%) wall 561 kB ( 1%) ggc tree FRE : 0.08 ( 0%) usr 0.03 ( 1%) sys 0.09 ( 0%) wall 125 kB ( 0%) ggc tree code sinking : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 98 kB ( 0%) ggc tree forward propagate: 0.03 ( 0%) usr 0.02 ( 1%) sys 0.03 ( 0%) wall 293 kB ( 0%) ggc tree conservative DCE : 0.02 ( 0%) usr 0.04 ( 2%) sys 0.04 ( 0%) wall 34 kB ( 0%) ggc tree aggressive DCE : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall 738 kB ( 1%) ggc tree DSE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 118 kB ( 0%) ggc PHI merge : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 241 kB ( 0%) ggc tree loop invariant motion: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc scev constant prop : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 206 kB ( 0%) ggc complete unrolling : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 779 kB ( 1%) ggc tree slp vectorization: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 1833 kB ( 2%) ggc tree loop distribution: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 3 kB ( 0%) ggc tree iv optimization : 0.04 ( 0%) usr 0.02 ( 1%) sys 0.04 ( 0%) wall 2413 kB ( 2%) ggc tree copy headers : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 227 kB ( 0%) ggc tree SSA uncprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc out of ssa : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 12 kB ( 0%) ggc expand vars : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 957 kB ( 1%) ggc expand : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 8481 kB ( 8%) ggc varconst : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 1 kB ( 0%) ggc forward prop : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 539 kB ( 1%) ggc CSE : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 87 kB ( 0%) ggc dead code elimination : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc dead store elim1 : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.07 ( 0%) wall 554 kB ( 1%) ggc dead store elim2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 600 kB ( 1%) ggc loop invariant motion : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall 0 kB ( 0%) ggc loop unswitching : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall 0 kB ( 0%) ggc CPROP : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 1555 kB ( 1%) ggc PRE : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 145 kB ( 0%) ggc CSE 2 : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 40 kB ( 0%) ggc branch prediction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 242 kB ( 0%) ggc combiner : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 1475 kB ( 1%) ggc if-conversion : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 385 kB ( 0%) ggc regmove : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc integrated RA : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall 1278 kB ( 1%) ggc reload : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 379 kB ( 0%) ggc reload CSE regs : 0.12 ( 0%) usr 0.01 ( 0%) sys 0.15 ( 0%) wall 1568 kB ( 1%) ggc load CSE after reload : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 6 kB ( 0%) ggc thread pro- & epilogue: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 128 kB ( 0%) ggc peephole 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 236 kB ( 0%) ggc hard reg cprop : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 3 kB ( 0%) ggc scheduling 2 : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 43 kB ( 0%) ggc machine dep reorg : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 46 kB ( 0%) ggc reorder blocks : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 1016 kB ( 1%) ggc reg stack : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 6 kB ( 0%) ggc final : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 1040 kB ( 1%) ggc symout : 0.06 ( 0%) usr 0.04 ( 2%) sys 0.10 ( 0%) wall 12092 kB (12%) ggc variable tracking : 0.32 ( 0%) usr 0.01 ( 0%) sys 0.32 ( 0%) wall 2201 kB ( 2%) ggc var-tracking dataflow : 157.01 (96%) usr 0.32 (13%) sys 158.01 (95%) wall 0 kB ( 0%) ggc var-tracking emit : 1.39 ( 1%) usr 0.01 ( 0%) sys 1.40 ( 1%) wall 726 kB ( 1%) ggc rest of compilation : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 565 kB ( 1%) ggc remove unused locals : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc address taken : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc rebuild frequencies : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 94 kB ( 0%) ggc repair loop structures: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 72 kB ( 0%) ggc TOTAL : 163.80 2.46 167.08 104717 kB