On Wed, Jun 4, 2008 at 8:31 AM, Basile STARYNKEVITCH <[EMAIL PROTECTED]> wrote: > Hello All, > > my MELT branch http://gcc.gnu.org/wiki/MiddleEndLispTranslator has a big > source file in it warm-basilys-0.c. It is "self" generated, about 14Mbytes & > almost 280KLOC (in rev136334). It ends with a big initialization routine of > 100KLOC which mostly fills a 5000 member structure (each member being itself > a small structure) and calls a few routines. This initialization routine has > a simple control structure (no deeply nested blocks or loops). > > But gcc (either gcc-4.1 or 4.2 or 4.3 from Debian, or the bootsrapped trunk > rev136331) can compile this file without any optimisation ie with -O0 -g3 in > about 16 seconds and less than 1Gb RAM. > > But on my 6 Gbytes machine (Core2, 2400MHz, Debian/Sid/AMD64) the cc1 > process with -O2 (either 4.2, 4.3 or the trunk) eats nearly 10Gb of virtual > memory and trashes (using 4.8Gb of RAM, 1% cpu time, waiting for the swap > IO). The same happens with -O1. -Os is a bit better. > > The time to run the > ./built-melt-cc-script warm-basilys-0.c warm-basilys-0.so > which compiles warm-basilys-0.c with -O2 -fPIC is > > (you can set the MELT_EXTRACFLAGS environment variable to pass > real 84m23.594s > user 6m23.496s > sys 1m5.032s > > I am attaching the -ftime-report output for information. One of the most > demanding passes is tree operand scan > > I find this report misleading on the memory consumption total (1591718kB = > 1.6Gb). The top command gives that cc1 needs nearly 10Gb of process space, > and uses nearly 5G (and trashes). > > I won't be annoyed for long by this, since I'll soon split the > warm-basilys.bysl file (and hence the generated files) in several distinct > files. Until then, -O0 is enough for me. > > Are there any specific flags to pass to gcc to lower the RAM consumption > (even at the expense of generated code quality)? > > Are there any pragma-s to disable (or lower) optimisation of a single > routine? > > My intuition (and experience) is that gcc -O2 (or even -O1) time and space > consumption is nearly quadratic on the size of the longest routine. > > Thanks for reading.
If it does structure initialization you can try --param max-fields-for-field-sensitive=0 --param max-aliased-vops=0 Otherwise can you file a bugreport and attach the testcase there? (bonus points if you have some that doesn't max out at 10GB but maybe 2GB ;)) Thanks, Richard. > > -- > Basile STARYNKEVITCH http://starynkevitch.net/Basile/ > email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 > 8, rue de la Faiencerie, 92340 Bourg La Reine, France > *** opinions {are only mines, sont seulement les miennes} *** > > > Execution times (seconds) > garbage collection : 7.16 ( 2%) usr 0.45 ( 1%) sys 47.16 ( 1%) wall > 0 kB ( 0%) ggc > callgraph construction: 16.83 ( 4%) usr 0.10 ( 0%) sys 16.87 ( 0%) wall > 41478 kB ( 3%) ggc > callgraph optimization: 9.82 ( 3%) usr 0.11 ( 0%) sys 9.95 ( 0%) wall > 9184 kB ( 1%) ggc > ipa reference : 0.25 ( 0%) usr 0.02 ( 0%) sys 0.26 ( 0%) wall > 52 kB ( 0%) ggc > ipa pure const : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall > 0 kB ( 0%) ggc > cfg cleanup : 2.76 ( 1%) usr 0.03 ( 0%) sys 2.91 ( 0%) wall > 5120 kB ( 0%) ggc > CFG verifier : 11.22 ( 3%) usr 0.69 ( 1%) sys 177.08 ( 3%) wall > 0 kB ( 0%) ggc > trivially dead code : 0.75 ( 0%) usr 0.00 ( 0%) sys 0.80 ( 0%) wall > 0 kB ( 0%) ggc > df reaching defs : 3.01 ( 1%) usr 0.49 ( 1%) sys 34.85 ( 1%) wall > 0 kB ( 0%) ggc > df live regs : 3.46 ( 1%) usr 0.06 ( 0%) sys 3.57 ( 0%) wall > 0 kB ( 0%) ggc > df live&initialized regs: 2.12 ( 1%) usr 0.00 ( 0%) sys 2.16 ( 0%) > wall 0 kB ( 0%) ggc > df use-def / def-use chains: 1.61 ( 0%) usr 0.02 ( 0%) sys 1.75 ( 0%) > wall 0 kB ( 0%) ggc > df reg dead/unused notes: 1.07 ( 0%) usr 0.04 ( 0%) sys 1.10 ( 0%) > wall 15075 kB ( 1%) ggc > register information : 0.51 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall > 0 kB ( 0%) ggc > alias analysis : 1.05 ( 0%) usr 0.01 ( 0%) sys 0.91 ( 0%) wall > 19781 kB ( 1%) ggc > register scan : 0.25 ( 0%) usr 0.01 ( 0%) sys 0.23 ( 0%) wall > 163 kB ( 0%) ggc > rebuild jump labels : 0.53 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) wall > 0 kB ( 0%) ggc > preprocessing : 1.24 ( 0%) usr 0.56 ( 1%) sys 1.93 ( 0%) wall > 46597 kB ( 3%) ggc > lexical analysis : 0.30 ( 0%) usr 0.81 ( 1%) sys 1.29 ( 0%) wall > 0 kB ( 0%) ggc > parser : 1.70 ( 0%) usr 0.49 ( 1%) sys 2.24 ( 0%) wall > 123365 kB ( 8%) ggc > inline heuristics : 0.63 ( 0%) usr 0.01 ( 0%) sys 0.62 ( 0%) wall > 5491 kB ( 0%) ggc > integration : 2.11 ( 1%) usr 0.22 ( 0%) sys 2.25 ( 0%) wall > 168932 kB (11%) ggc > tree gimplify : 1.86 ( 0%) usr 0.05 ( 0%) sys 1.78 ( 0%) wall > 109046 kB ( 7%) ggc > tree eh : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall > 0 kB ( 0%) ggc > tree CFG construction : 0.22 ( 0%) usr 0.01 ( 0%) sys 0.23 ( 0%) wall > 69444 kB ( 4%) ggc > tree CFG cleanup : 3.42 ( 1%) usr 0.03 ( 0%) sys 4.15 ( 0%) wall > 7307 kB ( 0%) ggc > tree VRP : 3.69 ( 1%) usr 0.24 ( 0%) sys 11.89 ( 0%) wall > 115325 kB ( 7%) ggc > tree copy propagation : 1.80 ( 0%) usr 0.05 ( 0%) sys 3.50 ( 0%) wall > 3511 kB ( 0%) ggc > tree find ref. vars : 0.12 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall > 9570 kB ( 1%) ggc > tree PTA : 2.59 ( 1%) usr 0.61 ( 1%) sys 57.50 ( 1%) wall > 17158 kB ( 1%) ggc > tree alias analysis : 1.13 ( 0%) usr 0.33 ( 1%) sys 26.66 ( 1%) wall > 2461 kB ( 0%) ggc > tree call clobbering : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall > 10 kB ( 0%) ggc > tree flow sensitive alias: 0.46 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) > wall 10992 kB ( 1%) ggc > tree flow insensitive alias: 8.41 ( 2%) usr 0.06 ( 0%) sys 8.96 ( 0%) > wall 0 kB ( 0%) ggc > tree memory partitioning: 0.38 ( 0%) usr 0.01 ( 0%) sys 0.41 ( 0%) > wall 111 kB ( 0%) ggc > tree PHI insertion : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall > 119 kB ( 0%) ggc > tree SSA rewrite : 1.44 ( 0%) usr 0.03 ( 0%) sys 1.46 ( 0%) wall > 44376 kB ( 3%) ggc > tree SSA other : 0.09 ( 0%) usr 0.09 ( 0%) sys 0.27 ( 0%) wall > 0 kB ( 0%) ggc > tree SSA incremental : 2.11 ( 1%) usr 0.14 ( 0%) sys 4.59 ( 0%) wall > 4795 kB ( 0%) ggc > tree operand scan : 80.93 (21%) usr 0.92 ( 1%) sys 82.92 ( 2%) wall > 71551 kB ( 4%) ggc > dominator optimization: 3.97 ( 1%) usr 0.06 ( 0%) sys 3.92 ( 0%) wall > 84156 kB ( 5%) ggc > tree SRA : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall > 0 kB ( 0%) ggc > tree STORE-CCP : 0.47 ( 0%) usr 0.05 ( 0%) sys 0.69 ( 0%) wall > 992 kB ( 0%) ggc > tree CCP : 0.93 ( 0%) usr 0.00 ( 0%) sys 0.94 ( 0%) wall > 1205 kB ( 0%) ggc > tree PHI const/copy prop: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) > wall 77 kB ( 0%) ggc > tree split crit edges : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall > 21401 kB ( 1%) ggc > tree reassociation : 0.43 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall > 236 kB ( 0%) ggc > tree PRE : 13.92 ( 4%) usr 52.21 (81%) sys4339.32 (86%) wall > 109776 kB ( 7%) ggc > tree FRE : 4.18 ( 1%) usr 2.51 ( 4%) sys 6.69 ( 0%) wall > 61570 kB ( 4%) ggc > tree code sinking : 0.53 ( 0%) usr 0.03 ( 0%) sys 1.54 ( 0%) wall > 1578 kB ( 0%) ggc > tree linearize phis : 0.16 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall > 0 kB ( 0%) ggc > tree forward propagate: 0.36 ( 0%) usr 0.03 ( 0%) sys 0.35 ( 0%) wall > 2466 kB ( 0%) ggc > tree phiprop : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall > 0 kB ( 0%) ggc > tree conservative DCE : 0.93 ( 0%) usr 0.01 ( 0%) sys 0.91 ( 0%) wall > 20 kB ( 0%) ggc > tree aggressive DCE : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall > 0 kB ( 0%) ggc > tree DSE : 0.35 ( 0%) usr 0.01 ( 0%) sys 0.33 ( 0%) wall > 562 kB ( 0%) ggc > PHI merge : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall > 0 kB ( 0%) ggc > loop invariant motion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall > 6 kB ( 0%) ggc > complete unrolling : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall > 316 kB ( 0%) ggc > tree iv optimization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall > 7 kB ( 0%) ggc > tree loop init : 0.29 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall > 281 kB ( 0%) ggc > tree loop fini : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall > 0 kB ( 0%) ggc > tree copy headers : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall > 524 kB ( 0%) ggc > tree SSA uncprop : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall > 0 kB ( 0%) ggc > tree SSA to normal : 52.85 (14%) usr 0.27 ( 0%) sys 53.12 ( 1%) wall > 25180 kB ( 2%) ggc > tree rename SSA copies: 0.22 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall > 0 kB ( 0%) ggc > tree SSA verifier : 21.08 ( 6%) usr 0.19 ( 0%) sys 21.67 ( 0%) wall > 4603 kB ( 0%) ggc > tree STMT verifier : 47.77 (12%) usr 1.47 ( 2%) sys 49.16 ( 1%) wall > 0 kB ( 0%) ggc > callgraph verifier : 0.86 ( 0%) usr 0.00 ( 0%) sys 0.93 ( 0%) wall > 2891 kB ( 0%) ggc > dominance frontiers : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall > 0 kB ( 0%) ggc > dominance computation : 3.59 ( 1%) usr 0.04 ( 0%) sys 3.55 ( 0%) wall > 0 kB ( 0%) ggc > control dependences : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall > 0 kB ( 0%) ggc > expand : 11.91 ( 3%) usr 0.31 ( 0%) sys 21.34 ( 0%) wall > 172552 kB (11%) ggc > lower subreg : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall > 0 kB ( 0%) ggc > jump : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall > 0 kB ( 0%) ggc > forward prop : 0.71 ( 0%) usr 0.01 ( 0%) sys 0.87 ( 0%) wall > 18126 kB ( 1%) ggc > CSE : 4.33 ( 1%) usr 0.03 ( 0%) sys 4.51 ( 0%) wall > 7344 kB ( 0%) ggc > dead code elimination : 0.63 ( 0%) usr 0.00 ( 0%) sys 0.58 ( 0%) wall > 0 kB ( 0%) ggc > dead store elim1 : 1.24 ( 0%) usr 0.00 ( 0%) sys 1.27 ( 0%) wall > 14629 kB ( 1%) ggc > dead store elim2 : 0.65 ( 0%) usr 0.01 ( 0%) sys 0.65 ( 0%) wall > 11488 kB ( 1%) ggc > loop analysis : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall > 278 kB ( 0%) ggc > global CSE : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall > 0 kB ( 0%) ggc > CPROP 1 : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall > 4114 kB ( 0%) ggc > PRE : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.46 ( 0%) wall > 3000 kB ( 0%) ggc > CPROP 2 : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall > 3110 kB ( 0%) ggc > bypass jumps : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall > 2539 kB ( 0%) ggc > CSE 2 : 4.29 ( 1%) usr 0.02 ( 0%) sys 4.21 ( 0%) wall > 5306 kB ( 0%) ggc > branch prediction : 0.66 ( 0%) usr 0.01 ( 0%) sys 0.67 ( 0%) wall > 3048 kB ( 0%) ggc > combiner : 1.60 ( 0%) usr 0.01 ( 0%) sys 1.72 ( 0%) wall > 22097 kB ( 1%) ggc > if-conversion : 0.70 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall > 456 kB ( 0%) ggc > regmove : 0.91 ( 0%) usr 0.01 ( 0%) sys 0.87 ( 0%) wall > 118 kB ( 0%) ggc > local alloc : 4.45 ( 1%) usr 0.01 ( 0%) sys 4.49 ( 0%) wall > 11555 kB ( 1%) ggc > global alloc : 9.35 ( 2%) usr 0.03 ( 0%) sys 9.42 ( 0%) wall > 37993 kB ( 2%) ggc > reload CSE regs : 1.83 ( 0%) usr 0.02 ( 0%) sys 1.90 ( 0%) wall > 30852 kB ( 2%) ggc > thread pro- & epilogue: 0.24 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall > 1494 kB ( 0%) ggc > if-conversion 2 : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall > 143 kB ( 0%) ggc > peephole 2 : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall > 2505 kB ( 0%) ggc > rename registers : 0.93 ( 0%) usr 0.00 ( 0%) sys 0.94 ( 0%) wall > 93 kB ( 0%) ggc > scheduling 2 : 2.72 ( 1%) usr 0.01 ( 0%) sys 2.75 ( 0%) wall > 1617 kB ( 0%) ggc > machine dep reorg : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall > 385 kB ( 0%) ggc > reorder blocks : 0.72 ( 0%) usr 0.00 ( 0%) sys 0.66 ( 0%) wall > 6485 kB ( 0%) ggc > final : 1.07 ( 0%) usr 0.02 ( 0%) sys 1.16 ( 0%) wall > 8151 kB ( 1%) ggc > symout : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall > 2181 kB ( 0%) ggc > tree if-combine : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall > 0 kB ( 0%) ggc > TOTAL : 382.44 64.16 5061.26 > 1591718 kB > >