On Wednesday, 28 June 2017 at 22:17:09 UTC, Iain Buclaw wrote:

Phase opt and generate is the topl-evel timer for the entire "backend" compilation phase. I was expecting to see more of a breakdown of individual passes.

Sorry, it didn't look broken down to me.  Here's the full report.

arm-none-eabi-gdc -c -O2 -finline-functions -nophoboslib -nostdinc -nodefaultlibs -nostdlib -fno-emit-moduleinfo -mthumb -mcpu=cortex-m4 -Isource/runtime -fno-bounds-check -fno-invariants -fno-in -fno-out -ffunction-sections -fdata-sections -ftime-report source/gcc/attribute.d source/board/package.d source/board/ILI9341.d source/board/lcd.d source/board/spi5.d source/board/statusLED.d source/board/random.d source/board/ltdc.d source/stm32f42/bus.d source/stm32f42/scb.d source/stm32f42/trace.d source/stm32f42/dma2d.d source/stm32f42/spi.d source/stm32f42/pwr.d source/stm32f42/rcc.d source/stm32f42/rng.d source/stm32f42/nvic.d source/stm32f42/mmio.d source/stm32f42/flash.d source/stm32f42/gpio.d source/stm32f42/ltdc.d source/main.d -o binary/firmware.o

Execution times (seconds)
phase setup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 2310 kB ( 0%) ggc phase parsing : 2.21 ( 4%) usr 0.32 ( 2%) sys 2.55 ( 3%) wall 160684 kB ( 6%) ggc phase opt and generate : 51.89 (96%) usr 20.13 (98%) sys 72.29 (97%) wall 2381419 kB (94%) ggc phase last asm : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 26 kB ( 0%) ggc phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc garbage collection : 0.90 ( 2%) usr 0.04 ( 0%) sys 1.05 ( 1%) wall 0 kB ( 0%) ggc dump files : 4.17 ( 8%) usr 1.96 (10%) sys 5.67 ( 8%) wall 0 kB ( 0%) ggc callgraph construction : 0.66 ( 1%) usr 0.20 ( 1%) sys 1.07 ( 1%) wall 26036 kB ( 1%) ggc callgraph optimization : 1.55 ( 3%) usr 0.78 ( 4%) sys 1.89 ( 3%) wall 1689 kB ( 0%) ggc ipa dead code removal : 0.29 ( 1%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 0 kB ( 0%) ggc ipa inheritance graph : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc ipa devirtualization : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc ipa cp : 0.21 ( 0%) usr 0.01 ( 0%) sys 0.18 ( 0%) wall 6160 kB ( 0%) ggc ipa inlining heuristics : 0.69 ( 1%) usr 0.15 ( 1%) sys 0.67 ( 1%) wall 88573 kB ( 3%) ggc ipa function splitting : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 9 kB ( 0%) ggc ipa comdats : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc ipa various optimizations: 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc ipa reference : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc ipa profile : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 0.38 ( 1%) usr 0.09 ( 0%) sys 0.54 ( 1%) wall 0 kB ( 0%) ggc ipa icf : 1.59 ( 3%) usr 0.01 ( 0%) sys 1.60 ( 2%) wall 11 kB ( 0%) ggc ipa SRA : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc ipa free lang data : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc ipa free inline summary : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc cfg construction : 0.15 ( 0%) usr 0.06 ( 0%) sys 0.12 ( 0%) wall 5 kB ( 0%) ggc cfg cleanup : 0.66 ( 1%) usr 0.27 ( 1%) sys 1.04 ( 1%) wall 17 kB ( 0%) ggc trivially dead code : 0.12 ( 0%) usr 0.05 ( 0%) sys 0.38 ( 1%) wall 0 kB ( 0%) ggc df scan insns : 0.45 ( 1%) usr 0.19 ( 1%) sys 0.56 ( 1%) wall 5569 kB ( 0%) ggc df multiple defs : 0.24 ( 0%) usr 0.06 ( 0%) sys 0.28 ( 0%) wall 0 kB ( 0%) ggc df reaching defs : 0.15 ( 0%) usr 0.03 ( 0%) sys 0.26 ( 0%) wall 0 kB ( 0%) ggc df live regs : 0.60 ( 1%) usr 0.25 ( 1%) sys 0.70 ( 1%) wall 0 kB ( 0%) ggc df live&initialized regs: 0.32 ( 1%) usr 0.13 ( 1%) sys 0.57 ( 1%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 0.05 ( 0%) usr 0.03 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 0.56 ( 1%) usr 0.18 ( 1%) sys 0.86 ( 1%) wall 2562 kB ( 0%) ggc register information : 0.14 ( 0%) usr 0.13 ( 1%) sys 0.40 ( 1%) wall 0 kB ( 0%) ggc alias analysis : 0.79 ( 1%) usr 0.34 ( 2%) sys 1.14 ( 2%) wall 28569 kB ( 1%) ggc alias stmt walking : 0.10 ( 0%) usr 0.02 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc register scan : 0.07 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 106 kB ( 0%) ggc rebuild jump labels : 0.05 ( 0%) usr 0.05 ( 0%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc parser (global) : 2.19 ( 4%) usr 0.32 ( 2%) sys 2.51 ( 3%) wall 160144 kB ( 6%) ggc early inlining heuristics: 0.17 ( 0%) usr 0.09 ( 0%) sys 0.24 ( 0%) wall 19510 kB ( 1%) ggc inline parameters : 0.35 ( 1%) usr 0.18 ( 1%) sys 0.44 ( 1%) wall 58124 kB ( 2%) ggc integration : 0.63 ( 1%) usr 0.24 ( 1%) sys 0.85 ( 1%) wall 80071 kB ( 3%) ggc tree gimplify : 0.48 ( 1%) usr 0.17 ( 1%) sys 0.53 ( 1%) wall 109681 kB ( 4%) ggc tree eh : 0.13 ( 0%) usr 0.07 ( 0%) sys 0.20 ( 0%) wall 13982 kB ( 1%) ggc tree CFG construction : 0.19 ( 0%) usr 0.05 ( 0%) sys 0.17 ( 0%) wall 54230 kB ( 2%) ggc tree CFG cleanup : 0.69 ( 1%) usr 0.38 ( 2%) sys 1.19 ( 2%) wall 1131 kB ( 0%) ggc tree tail merge : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc tree VRP : 0.93 ( 2%) usr 0.35 ( 2%) sys 1.29 ( 2%) wall 89761 kB ( 4%) ggc tree Early VRP : 0.21 ( 0%) usr 0.08 ( 0%) sys 0.31 ( 0%) wall 42204 kB ( 2%) ggc tree copy propagation : 0.06 ( 0%) usr 0.03 ( 0%) sys 0.10 ( 0%) wall 0 kB ( 0%) ggc tree PTA : 1.78 ( 3%) usr 0.85 ( 4%) sys 2.50 ( 3%) wall 4103 kB ( 0%) ggc tree PHI insertion : 0.07 ( 0%) usr 0.02 ( 0%) sys 0.03 ( 0%) wall 6571 kB ( 0%) ggc tree SSA rewrite : 0.16 ( 0%) usr 0.06 ( 0%) sys 0.20 ( 0%) wall 20087 kB ( 1%) ggc tree SSA other : 0.21 ( 0%) usr 0.13 ( 1%) sys 0.51 ( 1%) wall 5602 kB ( 0%) ggc tree SSA incremental : 0.15 ( 0%) usr 0.10 ( 0%) sys 0.30 ( 0%) wall 60 kB ( 0%) ggc tree operand scan : 0.34 ( 1%) usr 0.22 ( 1%) sys 0.56 ( 1%) wall 56364 kB ( 2%) ggc dominator optimization : 0.73 ( 1%) usr 0.22 ( 1%) sys 0.75 ( 1%) wall 7545 kB ( 0%) ggc backwards jump threading: 0.30 ( 1%) usr 0.09 ( 0%) sys 0.25 ( 0%) wall 111 kB ( 0%) ggc tree SRA : 0.13 ( 0%) usr 0.04 ( 0%) sys 0.17 ( 0%) wall 28 kB ( 0%) ggc isolate eroneous paths : 0.04 ( 0%) usr 0.03 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc tree CCP : 0.68 ( 1%) usr 0.24 ( 1%) sys 0.85 ( 1%) wall 7302 kB ( 0%) ggc tree PHI const/copy prop: 0.05 ( 0%) usr 0.02 ( 0%) sys 0.10 ( 0%) wall 0 kB ( 0%) ggc tree split crit edges : 0.05 ( 0%) usr 0.06 ( 0%) sys 0.17 ( 0%) wall 19 kB ( 0%) ggc tree reassociation : 0.23 ( 0%) usr 0.07 ( 0%) sys 0.38 ( 1%) wall 6 kB ( 0%) ggc tree PRE : 1.28 ( 2%) usr 0.48 ( 2%) sys 1.78 ( 2%) wall 50466 kB ( 2%) ggc tree FRE : 0.69 ( 1%) usr 0.36 ( 2%) sys 1.22 ( 2%) wall 17297 kB ( 1%) ggc tree code sinking : 0.10 ( 0%) usr 0.05 ( 0%) sys 0.13 ( 0%) wall 6 kB ( 0%) ggc tree linearize phis : 0.19 ( 0%) usr 0.08 ( 0%) sys 0.27 ( 0%) wall 41714 kB ( 2%) ggc tree backward propagate : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc tree forward propagate : 0.23 ( 0%) usr 0.08 ( 0%) sys 0.38 ( 1%) wall 62 kB ( 0%) ggc tree phiprop : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc tree conservative DCE : 0.21 ( 0%) usr 0.15 ( 1%) sys 0.36 ( 0%) wall 209 kB ( 0%) ggc tree aggressive DCE : 0.28 ( 1%) usr 0.12 ( 1%) sys 0.44 ( 1%) wall 83438 kB ( 3%) ggc tree buildin call DCE : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.09 ( 0%) usr 0.09 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc PHI merge : 0.07 ( 0%) usr 0.04 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc tree loop optimization : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc loopless fn : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc tree loop invariant motion: 0.01 ( 0%) usr 0.02 ( 0%) sys 0.07 ( 0%) wall 1 kB ( 0%) ggc complete unrolling : 0.05 ( 0%) usr 0.04 ( 0%) sys 0.12 ( 0%) wall 136 kB ( 0%) ggc tree iv optimization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 120 kB ( 0%) ggc tree copy headers : 0.03 ( 0%) usr 0.02 ( 0%) sys 0.03 ( 0%) wall 7 kB ( 0%) ggc tree SSA uncprop : 0.28 ( 1%) usr 0.13 ( 1%) sys 0.31 ( 0%) wall 0 kB ( 0%) ggc tree NRV optimization : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 849 kB ( 0%) ggc tree switch conversion : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree strlen optimization: 0.03 ( 0%) usr 0.01 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.09 ( 0%) usr 0.02 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 1.42 ( 3%) usr 0.51 ( 2%) sys 1.94 ( 3%) wall 0 kB ( 0%) ggc control dependences : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc out of ssa : 0.22 ( 0%) usr 0.10 ( 0%) sys 0.35 ( 0%) wall 7465 kB ( 0%) ggc expand vars : 0.02 ( 0%) usr 0.02 ( 0%) sys 0.04 ( 0%) wall 506 kB ( 0%) ggc expand : 0.63 ( 1%) usr 0.24 ( 1%) sys 1.12 ( 1%) wall 63840 kB ( 3%) ggc post expand cleanups : 0.24 ( 0%) usr 0.04 ( 0%) sys 0.23 ( 0%) wall 18401 kB ( 1%) ggc varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 539 kB ( 0%) ggc lower subreg : 0.07 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc jump : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc forward prop : 0.73 ( 1%) usr 0.26 ( 1%) sys 0.86 ( 1%) wall 2110 kB ( 0%) ggc CSE : 0.50 ( 1%) usr 0.19 ( 1%) sys 0.73 ( 1%) wall 1053 kB ( 0%) ggc dead code elimination : 0.23 ( 0%) usr 0.07 ( 0%) sys 0.38 ( 1%) wall 0 kB ( 0%) ggc dead store elim1 : 0.24 ( 0%) usr 0.09 ( 0%) sys 0.48 ( 1%) wall 1039 kB ( 0%) ggc dead store elim2 : 0.27 ( 0%) usr 0.14 ( 1%) sys 0.39 ( 1%) wall 960 kB ( 0%) ggc loop analysis : 0.10 ( 0%) usr 0.06 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc loop init : 1.34 ( 2%) usr 0.51 ( 2%) sys 1.93 ( 3%) wall 183463 kB ( 7%) ggc loop invariant motion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 1 kB ( 0%) ggc loop doloop : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 36 kB ( 0%) ggc loop fini : 0.61 ( 1%) usr 0.31 ( 2%) sys 0.94 ( 1%) wall 0 kB ( 0%) ggc CPROP : 0.21 ( 0%) usr 0.05 ( 0%) sys 0.21 ( 0%) wall 295 kB ( 0%) ggc PRE : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) wall 4 kB ( 0%) ggc auto inc dec : 0.12 ( 0%) usr 0.06 ( 0%) sys 0.15 ( 0%) wall 934 kB ( 0%) ggc CSE 2 : 0.29 ( 1%) usr 0.18 ( 1%) sys 0.44 ( 1%) wall 171 kB ( 0%) ggc branch prediction : 0.20 ( 0%) usr 0.06 ( 0%) sys 0.16 ( 0%) wall 4067 kB ( 0%) ggc combiner : 0.84 ( 2%) usr 0.22 ( 1%) sys 1.42 ( 2%) wall 13624 kB ( 1%) ggc if-conversion : 0.28 ( 1%) usr 0.08 ( 0%) sys 0.41 ( 1%) wall 2 kB ( 0%) ggc scheduling : 1.45 ( 3%) usr 0.63 ( 3%) sys 2.15 ( 3%) wall 4177 kB ( 0%) ggc integrated RA : 1.83 ( 3%) usr 0.70 ( 3%) sys 2.45 ( 3%) wall 964084 kB (38%) ggc LRA non-specific : 0.69 ( 1%) usr 0.33 ( 2%) sys 0.90 ( 1%) wall 2272 kB ( 0%) ggc LRA virtuals elimination: 0.27 ( 0%) usr 0.15 ( 1%) sys 0.36 ( 0%) wall 1881 kB ( 0%) ggc LRA reload inheritance : 0.09 ( 0%) usr 0.04 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc LRA create live ranges : 0.12 ( 0%) usr 0.06 ( 0%) sys 0.12 ( 0%) wall 1 kB ( 0%) ggc LRA hard reg assignment : 0.09 ( 0%) usr 0.05 ( 0%) sys 0.20 ( 0%) wall 0 kB ( 0%) ggc reload : 0.12 ( 0%) usr 0.06 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc reload CSE regs : 0.46 ( 1%) usr 0.09 ( 0%) sys 0.43 ( 1%) wall 2852 kB ( 0%) ggc thread pro- & epilogue : 0.25 ( 0%) usr 0.13 ( 1%) sys 0.45 ( 1%) wall 37093 kB ( 1%) ggc if-conversion 2 : 0.06 ( 0%) usr 0.02 ( 0%) sys 0.18 ( 0%) wall 0 kB ( 0%) ggc peephole 2 : 0.11 ( 0%) usr 0.04 ( 0%) sys 0.18 ( 0%) wall 11 kB ( 0%) ggc hard reg cprop : 0.12 ( 0%) usr 0.05 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc scheduling 2 : 1.05 ( 2%) usr 0.44 ( 2%) sys 1.67 ( 2%) wall 3203 kB ( 0%) ggc machine dep reorg : 0.21 ( 0%) usr 0.05 ( 0%) sys 0.26 ( 0%) wall 10319 kB ( 0%) ggc reorder blocks : 0.10 ( 0%) usr 0.03 ( 0%) sys 0.20 ( 0%) wall 20 kB ( 0%) ggc shorten branches : 0.16 ( 0%) usr 0.05 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc final : 0.88 ( 2%) usr 0.47 ( 2%) sys 1.51 ( 2%) wall 15600 kB ( 1%) ggc variable output : 0.30 ( 1%) usr 0.03 ( 0%) sys 0.33 ( 0%) wall 10352 kB ( 0%) ggc symout : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc tree if-combine : 0.05 ( 0%) usr 0.02 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc straight-line strength reduction: 0.13 ( 0%) usr 0.07 ( 0%) sys 0.22 ( 0%) wall 0 kB ( 0%) ggc store merging : 0.07 ( 0%) usr 0.03 ( 0%) sys 0.01 ( 0%) wall 9 kB ( 0%) ggc address lowering : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc early local passes : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc unaccounted optimizations: 0.01 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc rest of compilation : 5.83 (11%) usr 2.63 (13%) sys 8.49 (11%) wall 101391 kB ( 4%) ggc unaccounted post reload : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc unaccounted late compilation: 0.03 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc remove unused locals : 0.18 ( 0%) usr 0.08 ( 0%) sys 0.22 ( 0%) wall 0 kB ( 0%) ggc address taken : 0.13 ( 0%) usr 0.03 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc rebuild frequencies : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc repair loop structures : 0.07 ( 0%) usr 0.03 ( 0%) sys 0.14 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 54.11 20.45 74.91 2544441 kB

A thought just occurred to me, you are compiling the entire program + object.d right? Nothing else will link/be linked to the binary?

I'm passing all files except druntime files via the command line. druntime files are imported via -Isource/runtime. But essentially yes, I'm compiling the entire application in one command so I can get cross-module inlining.

I tried moving all runtime files to the command line, but I get errors about __entrypoint.

cc1d: error: module __entrypoint is in file '__entrypoint.d' which cannot be read
Specify path to file '__entrypoint.d' with -I switch


If that is the case, you should definitely compile with -fwhole-program. I suspect that may cut down your compilation time by half or even more.

If I only import __entrypoint.d and pass the rest of the runtime files on the command line and compile with -fwhole-program, it compiles in 5s, but I only get an 8byte binary. I suspect this is due to the error above about __entrypoint. That is, if there's no entry point, the whole program gets garbage collected.

I think you might be on to something here though.

I'm out of time now; gotta catch a plane soon. I'll try to do more troubleshooting when I return.

Mike

Reply via email to