On Wed, Jun 4, 2008 at 8:31 AM, Basile STARYNKEVITCH
<[EMAIL PROTECTED]> wrote:
> Hello All,
>
> my MELT branch http://gcc.gnu.org/wiki/MiddleEndLispTranslator has a big
> source file in it warm-basilys-0.c. It is "self" generated, about 14Mbytes &
> almost 280KLOC (in rev136334). It ends with a big initialization routine of
> 100KLOC which mostly fills a 5000 member structure (each member being itself
> a small structure) and calls a few routines. This initialization routine has
> a simple control structure (no deeply nested blocks or loops).
>
> But gcc (either gcc-4.1 or 4.2 or 4.3 from Debian, or the bootsrapped trunk
> rev136331) can compile this file without any optimisation ie with -O0 -g3 in
> about 16 seconds and less than 1Gb RAM.
>
> But on my 6 Gbytes machine (Core2, 2400MHz, Debian/Sid/AMD64) the cc1
> process with -O2 (either 4.2, 4.3 or the trunk) eats  nearly 10Gb of virtual
> memory and trashes (using 4.8Gb of RAM, 1% cpu time, waiting for the swap
> IO). The same happens with -O1. -Os is a bit better.
>
> The time to run the
> ./built-melt-cc-script warm-basilys-0.c warm-basilys-0.so
> which compiles warm-basilys-0.c with -O2 -fPIC is
>
> (you can set the MELT_EXTRACFLAGS environment variable to pass
> real    84m23.594s
> user    6m23.496s
> sys     1m5.032s
>
> I am attaching the -ftime-report output for information. One of the most
> demanding passes is tree operand scan
>
> I find this report misleading on the memory consumption total (1591718kB =
> 1.6Gb). The top command gives that cc1 needs nearly 10Gb of process space,
> and uses nearly 5G (and trashes).
>
> I won't be annoyed for long by this, since I'll soon split the
> warm-basilys.bysl file (and hence the generated files) in several distinct
> files. Until then, -O0 is enough for me.
>
> Are there any specific flags to pass to gcc to lower the RAM consumption
> (even at the expense of generated code quality)?
>
> Are there any pragma-s to disable (or lower) optimisation of a single
> routine?
>
> My intuition (and experience) is that gcc -O2 (or even -O1) time and space
> consumption is nearly quadratic on the size of the longest routine.
>
> Thanks for reading.

If it does structure initialization you can try --param
max-fields-for-field-sensitive=0 --param max-aliased-vops=0

Otherwise can you file a bugreport and attach the testcase there?
(bonus points if you have some that doesn't max out at 10GB but
maybe 2GB ;))

Thanks,
Richard.

>
> --
> Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
> email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
> *** opinions {are only mines, sont seulement les miennes} ***
>
>
> Execution times (seconds)
>  garbage collection    :   7.16 ( 2%) usr   0.45 ( 1%) sys  47.16 ( 1%) wall
>       0 kB ( 0%) ggc
>  callgraph construction:  16.83 ( 4%) usr   0.10 ( 0%) sys  16.87 ( 0%) wall
>   41478 kB ( 3%) ggc
>  callgraph optimization:   9.82 ( 3%) usr   0.11 ( 0%) sys   9.95 ( 0%) wall
>    9184 kB ( 1%) ggc
>  ipa reference         :   0.25 ( 0%) usr   0.02 ( 0%) sys   0.26 ( 0%) wall
>      52 kB ( 0%) ggc
>  ipa pure const        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
>       0 kB ( 0%) ggc
>  cfg cleanup           :   2.76 ( 1%) usr   0.03 ( 0%) sys   2.91 ( 0%) wall
>    5120 kB ( 0%) ggc
>  CFG verifier          :  11.22 ( 3%) usr   0.69 ( 1%) sys 177.08 ( 3%) wall
>       0 kB ( 0%) ggc
>  trivially dead code   :   0.75 ( 0%) usr   0.00 ( 0%) sys   0.80 ( 0%) wall
>       0 kB ( 0%) ggc
>  df reaching defs      :   3.01 ( 1%) usr   0.49 ( 1%) sys  34.85 ( 1%) wall
>       0 kB ( 0%) ggc
>  df live regs          :   3.46 ( 1%) usr   0.06 ( 0%) sys   3.57 ( 0%) wall
>       0 kB ( 0%) ggc
>  df live&initialized regs:   2.12 ( 1%) usr   0.00 ( 0%) sys   2.16 ( 0%)
> wall       0 kB ( 0%) ggc
>  df use-def / def-use chains:   1.61 ( 0%) usr   0.02 ( 0%) sys   1.75 ( 0%)
> wall       0 kB ( 0%) ggc
>  df reg dead/unused notes:   1.07 ( 0%) usr   0.04 ( 0%) sys   1.10 ( 0%)
> wall   15075 kB ( 1%) ggc
>  register information  :   0.51 ( 0%) usr   0.01 ( 0%) sys   0.45 ( 0%) wall
>       0 kB ( 0%) ggc
>  alias analysis        :   1.05 ( 0%) usr   0.01 ( 0%) sys   0.91 ( 0%) wall
>   19781 kB ( 1%) ggc
>  register scan         :   0.25 ( 0%) usr   0.01 ( 0%) sys   0.23 ( 0%) wall
>     163 kB ( 0%) ggc
>  rebuild jump labels   :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.53 ( 0%) wall
>       0 kB ( 0%) ggc
>  preprocessing         :   1.24 ( 0%) usr   0.56 ( 1%) sys   1.93 ( 0%) wall
>   46597 kB ( 3%) ggc
>  lexical analysis      :   0.30 ( 0%) usr   0.81 ( 1%) sys   1.29 ( 0%) wall
>       0 kB ( 0%) ggc
>  parser                :   1.70 ( 0%) usr   0.49 ( 1%) sys   2.24 ( 0%) wall
>  123365 kB ( 8%) ggc
>  inline heuristics     :   0.63 ( 0%) usr   0.01 ( 0%) sys   0.62 ( 0%) wall
>    5491 kB ( 0%) ggc
>  integration           :   2.11 ( 1%) usr   0.22 ( 0%) sys   2.25 ( 0%) wall
>  168932 kB (11%) ggc
>  tree gimplify         :   1.86 ( 0%) usr   0.05 ( 0%) sys   1.78 ( 0%) wall
>  109046 kB ( 7%) ggc
>  tree eh               :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
>       0 kB ( 0%) ggc
>  tree CFG construction :   0.22 ( 0%) usr   0.01 ( 0%) sys   0.23 ( 0%) wall
>   69444 kB ( 4%) ggc
>  tree CFG cleanup      :   3.42 ( 1%) usr   0.03 ( 0%) sys   4.15 ( 0%) wall
>    7307 kB ( 0%) ggc
>  tree VRP              :   3.69 ( 1%) usr   0.24 ( 0%) sys  11.89 ( 0%) wall
>  115325 kB ( 7%) ggc
>  tree copy propagation :   1.80 ( 0%) usr   0.05 ( 0%) sys   3.50 ( 0%) wall
>    3511 kB ( 0%) ggc
>  tree find ref. vars   :   0.12 ( 0%) usr   0.01 ( 0%) sys   0.12 ( 0%) wall
>    9570 kB ( 1%) ggc
>  tree PTA              :   2.59 ( 1%) usr   0.61 ( 1%) sys  57.50 ( 1%) wall
>   17158 kB ( 1%) ggc
>  tree alias analysis   :   1.13 ( 0%) usr   0.33 ( 1%) sys  26.66 ( 1%) wall
>    2461 kB ( 0%) ggc
>  tree call clobbering  :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall
>      10 kB ( 0%) ggc
>  tree flow sensitive alias:   0.46 ( 0%) usr   0.00 ( 0%) sys   0.53 ( 0%)
> wall   10992 kB ( 1%) ggc
>  tree flow insensitive alias:   8.41 ( 2%) usr   0.06 ( 0%) sys   8.96 ( 0%)
> wall       0 kB ( 0%) ggc
>  tree memory partitioning:   0.38 ( 0%) usr   0.01 ( 0%) sys   0.41 ( 0%)
> wall     111 kB ( 0%) ggc
>  tree PHI insertion    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
>     119 kB ( 0%) ggc
>  tree SSA rewrite      :   1.44 ( 0%) usr   0.03 ( 0%) sys   1.46 ( 0%) wall
>   44376 kB ( 3%) ggc
>  tree SSA other        :   0.09 ( 0%) usr   0.09 ( 0%) sys   0.27 ( 0%) wall
>       0 kB ( 0%) ggc
>  tree SSA incremental  :   2.11 ( 1%) usr   0.14 ( 0%) sys   4.59 ( 0%) wall
>    4795 kB ( 0%) ggc
>  tree operand scan     :  80.93 (21%) usr   0.92 ( 1%) sys  82.92 ( 2%) wall
>   71551 kB ( 4%) ggc
>  dominator optimization:   3.97 ( 1%) usr   0.06 ( 0%) sys   3.92 ( 0%) wall
>   84156 kB ( 5%) ggc
>  tree SRA              :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
>       0 kB ( 0%) ggc
>  tree STORE-CCP        :   0.47 ( 0%) usr   0.05 ( 0%) sys   0.69 ( 0%) wall
>     992 kB ( 0%) ggc
>  tree CCP              :   0.93 ( 0%) usr   0.00 ( 0%) sys   0.94 ( 0%) wall
>    1205 kB ( 0%) ggc
>  tree PHI const/copy prop:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%)
> wall      77 kB ( 0%) ggc
>  tree split crit edges :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
>   21401 kB ( 1%) ggc
>  tree reassociation    :   0.43 ( 0%) usr   0.01 ( 0%) sys   0.45 ( 0%) wall
>     236 kB ( 0%) ggc
>  tree PRE              :  13.92 ( 4%) usr  52.21 (81%) sys4339.32 (86%) wall
>  109776 kB ( 7%) ggc
>  tree FRE              :   4.18 ( 1%) usr   2.51 ( 4%) sys   6.69 ( 0%) wall
>   61570 kB ( 4%) ggc
>  tree code sinking     :   0.53 ( 0%) usr   0.03 ( 0%) sys   1.54 ( 0%) wall
>    1578 kB ( 0%) ggc
>  tree linearize phis   :   0.16 ( 0%) usr   0.01 ( 0%) sys   0.14 ( 0%) wall
>       0 kB ( 0%) ggc
>  tree forward propagate:   0.36 ( 0%) usr   0.03 ( 0%) sys   0.35 ( 0%) wall
>    2466 kB ( 0%) ggc
>  tree phiprop          :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
>       0 kB ( 0%) ggc
>  tree conservative DCE :   0.93 ( 0%) usr   0.01 ( 0%) sys   0.91 ( 0%) wall
>      20 kB ( 0%) ggc
>  tree aggressive DCE   :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall
>       0 kB ( 0%) ggc
>  tree DSE              :   0.35 ( 0%) usr   0.01 ( 0%) sys   0.33 ( 0%) wall
>     562 kB ( 0%) ggc
>  PHI merge             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
>       0 kB ( 0%) ggc
>  loop invariant motion :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
>       6 kB ( 0%) ggc
>  complete unrolling    :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall
>     316 kB ( 0%) ggc
>  tree iv optimization  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
>       7 kB ( 0%) ggc
>  tree loop init        :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall
>     281 kB ( 0%) ggc
>  tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
>       0 kB ( 0%) ggc
>  tree copy headers     :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall
>     524 kB ( 0%) ggc
>  tree SSA uncprop      :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
>       0 kB ( 0%) ggc
>  tree SSA to normal    :  52.85 (14%) usr   0.27 ( 0%) sys  53.12 ( 1%) wall
>   25180 kB ( 2%) ggc
>  tree rename SSA copies:   0.22 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall
>       0 kB ( 0%) ggc
>  tree SSA verifier     :  21.08 ( 6%) usr   0.19 ( 0%) sys  21.67 ( 0%) wall
>    4603 kB ( 0%) ggc
>  tree STMT verifier    :  47.77 (12%) usr   1.47 ( 2%) sys  49.16 ( 1%) wall
>       0 kB ( 0%) ggc
>  callgraph verifier    :   0.86 ( 0%) usr   0.00 ( 0%) sys   0.93 ( 0%) wall
>    2891 kB ( 0%) ggc
>  dominance frontiers   :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall
>       0 kB ( 0%) ggc
>  dominance computation :   3.59 ( 1%) usr   0.04 ( 0%) sys   3.55 ( 0%) wall
>       0 kB ( 0%) ggc
>  control dependences   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
>       0 kB ( 0%) ggc
>  expand                :  11.91 ( 3%) usr   0.31 ( 0%) sys  21.34 ( 0%) wall
>  172552 kB (11%) ggc
>  lower subreg          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
>       0 kB ( 0%) ggc
>  jump                  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
>       0 kB ( 0%) ggc
>  forward prop          :   0.71 ( 0%) usr   0.01 ( 0%) sys   0.87 ( 0%) wall
>   18126 kB ( 1%) ggc
>  CSE                   :   4.33 ( 1%) usr   0.03 ( 0%) sys   4.51 ( 0%) wall
>    7344 kB ( 0%) ggc
>  dead code elimination :   0.63 ( 0%) usr   0.00 ( 0%) sys   0.58 ( 0%) wall
>       0 kB ( 0%) ggc
>  dead store elim1      :   1.24 ( 0%) usr   0.00 ( 0%) sys   1.27 ( 0%) wall
>   14629 kB ( 1%) ggc
>  dead store elim2      :   0.65 ( 0%) usr   0.01 ( 0%) sys   0.65 ( 0%) wall
>   11488 kB ( 1%) ggc
>  loop analysis         :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall
>     278 kB ( 0%) ggc
>  global CSE            :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
>       0 kB ( 0%) ggc
>  CPROP 1               :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
>    4114 kB ( 0%) ggc
>  PRE                   :   0.34 ( 0%) usr   0.00 ( 0%) sys   0.46 ( 0%) wall
>    3000 kB ( 0%) ggc
>  CPROP 2               :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall
>    3110 kB ( 0%) ggc
>  bypass jumps          :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 0%) wall
>    2539 kB ( 0%) ggc
>  CSE 2                 :   4.29 ( 1%) usr   0.02 ( 0%) sys   4.21 ( 0%) wall
>    5306 kB ( 0%) ggc
>  branch prediction     :   0.66 ( 0%) usr   0.01 ( 0%) sys   0.67 ( 0%) wall
>    3048 kB ( 0%) ggc
>  combiner              :   1.60 ( 0%) usr   0.01 ( 0%) sys   1.72 ( 0%) wall
>   22097 kB ( 1%) ggc
>  if-conversion         :   0.70 ( 0%) usr   0.01 ( 0%) sys   0.78 ( 0%) wall
>     456 kB ( 0%) ggc
>  regmove               :   0.91 ( 0%) usr   0.01 ( 0%) sys   0.87 ( 0%) wall
>     118 kB ( 0%) ggc
>  local alloc           :   4.45 ( 1%) usr   0.01 ( 0%) sys   4.49 ( 0%) wall
>   11555 kB ( 1%) ggc
>  global alloc          :   9.35 ( 2%) usr   0.03 ( 0%) sys   9.42 ( 0%) wall
>   37993 kB ( 2%) ggc
>  reload CSE regs       :   1.83 ( 0%) usr   0.02 ( 0%) sys   1.90 ( 0%) wall
>   30852 kB ( 2%) ggc
>  thread pro- & epilogue:   0.24 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall
>    1494 kB ( 0%) ggc
>  if-conversion 2       :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall
>     143 kB ( 0%) ggc
>  peephole 2            :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall
>    2505 kB ( 0%) ggc
>  rename registers      :   0.93 ( 0%) usr   0.00 ( 0%) sys   0.94 ( 0%) wall
>      93 kB ( 0%) ggc
>  scheduling 2          :   2.72 ( 1%) usr   0.01 ( 0%) sys   2.75 ( 0%) wall
>    1617 kB ( 0%) ggc
>  machine dep reorg     :   0.34 ( 0%) usr   0.00 ( 0%) sys   0.40 ( 0%) wall
>     385 kB ( 0%) ggc
>  reorder blocks        :   0.72 ( 0%) usr   0.00 ( 0%) sys   0.66 ( 0%) wall
>    6485 kB ( 0%) ggc
>  final                 :   1.07 ( 0%) usr   0.02 ( 0%) sys   1.16 ( 0%) wall
>    8151 kB ( 1%) ggc
>  symout                :   0.03 ( 0%) usr   0.01 ( 0%) sys   0.04 ( 0%) wall
>    2181 kB ( 0%) ggc
>  tree if-combine       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
>       0 kB ( 0%) ggc
>  TOTAL                 : 382.44            64.16          5061.26
>  1591718 kB
>
>

Reply via email to