On Wednesday, 28 June 2017 at 16:52:26 UTC, Iain Buclaw wrote:

You probably want to tone down on optimizations as well. -O3 will be doing a lot of work, sometimes for little or no gain. In most cases, -O2 -finline-functions is good enough, which can be abbreviated further as simply -Os. [for full list of enabled/disabled passes: gdc -Q -Os --help=optimizers]

You can see a breakdown of what areas the compiler spends the most time in with -ftime-report

Compiling with -O3
------------------
phase opt and generate : 50.74 (96%) usr 21.24 (99%) sys 72.94 (97%) wall 2426962 kB (94%) ggc TOTAL : 53.02 21.49 75.55 2589984 kB

real    1m21.339s
user    0m57.086s
sys     0m22.297s

arm-none-eabi-size binary/firmware
   text    data     bss     dec     hex filename
   6228       0  153600  159828   27054 binary/firmware


Compiling with -O2 -finline-functions
-------------------------------------
phase opt and generate : 50.71 (96%) usr 20.58 (98%) sys 72.04 (97%) wall 2381419 kB (94%) ggc TOTAL : 52.89 20.93 74.63 2544441 kB

real    1m20.755s
user    0m56.857s
sys     0m21.826s

arm-none-eabi-size binary/firmware
   text    data     bss     dec     hex filename
   5912       0  153600  159512   26f18 binary/firmware

Compiling with -O0
------------------
phase opt and generate : 22.95 (91%) usr 5.42 (94%) sys 28.38 (92%) wall 1777106 kB (92%) ggc TOTAL : 25.14 5.74 30.94 1940102 kB

real    0m36.476s
user    0m29.600s
sys     0m6.647s

arm-none-eabi-size binary/firmware
   text    data     bss     dec     hex filename
  45250       0  153600  198850   308c2 binary/firmware


-------------------------------------------------------------------------
The vast majority of time is spent in "phase opt and generate". A few observations:

* Elapsed time isn't much different between -O3 and -O2 -finline-functions
* -O2 -finline-functions gave me a smaller binary :)
* -O0 reduced time significantly, but "phase opt and generate" still takes an awfully long time relative to everything else

What exactly is "phase opt and generate"? I'm assuming "opt" means optimizer, but why is it taking such a long time even with -O0? Maybe it's the "generate" part of that that's the most significant.

With -O0 there's still quite a few things enabled, so maybe I'll start appending a "-fno" to each one and see if I can find a culprit.

-O0 -Q --help=optimizers
  -faggressive-loop-optimizations       [enabled]
  -fauto-inc-dec                        [enabled]
  -fdce                                 [enabled]
  -fdelete-null-pointer-checks          [enabled]
  -fdse                                 [enabled]
  -fearly-inlining                      [enabled]
  -ffp-contract=[off|on|fast]           fast
  -ffp-int-builtin-inexact              [enabled]
  -ffunction-cse                        [enabled]
  -fgcse-lm                             [enabled]
  -finline                              [enabled]
  -finline-atomics                      [enabled]
  -fira-hoist-pressure                  [enabled]
  -fira-share-save-slots                [enabled]
  -fira-share-spill-slots               [enabled]
  -fivopts                              [enabled]
  -fjump-tables                         [enabled]
  -flifetime-dse                        [enabled]
  -fmath-errno                          [enabled]
  -fpeephole                            [enabled]
  -fplt                                 [enabled]
  -fprefetch-loop-arrays                [enabled]
  -fprintf-return-value                 [enabled]
  -freg-struct-return                   [enabled]
  -frename-registers                    [enabled]
  -frtti                                [enabled]
  -fsched-critical-path-heuristic       [enabled]
  -fsched-dep-count-heuristic           [enabled]
  -fsched-group-heuristic               [enabled]
  -fsched-interblock                    [enabled]
  -fsched-last-insn-heuristic           [enabled]
  -fsched-rank-heuristic                [enabled]
  -fsched-spec                          [enabled]
  -fsched-spec-insn-heuristic           [enabled]
  -fsched-stalled-insns-dep             [enabled]
  -fschedule-fusion                     [enabled]
  -fshort-enums                         [enabled]
  -fshrink-wrap-separate                [enabled]
  -fsigned-zeros                        [enabled]
  -fsimd-cost-model=[unlimited|dynamic|cheap]   unlimited
  -fsplit-ivs-in-unroller               [enabled]
  -fssa-backprop                        [enabled]
  -fstack-reuse=[all|named_vars|none]   all
  -fstdarg-opt                          [enabled]
  -fstrict-volatile-bitfields           [enabled]
  -fno-threadsafe-statics               [enabled]
  -ftrapping-math                       [enabled]
  -ftree-cselim                         [enabled]
  -ftree-forwprop                       [enabled]
  -ftree-loop-if-convert                [enabled]
  -ftree-loop-im                        [enabled]
  -ftree-loop-ivcanon                   [enabled]
  -ftree-loop-optimize                  [enabled]
  -ftree-phiprop                        [enabled]
  -ftree-reassoc                        [enabled]
  -ftree-scev-cprop                     [enabled]
  -fvar-tracking                        [enabled]
  -fvar-tracking-assignments            [enabled]
  -fweb                                 [enabled]

Reply via email to