> > On 06/30/2010 02:26 PM, Basile Starynkevitch wrote: > >> On Wed, 2010-06-30 at 14:23 -0700, Taras Glek wrote: > >> > >>> I tried 4.5 -O2 and it's actually faster than 4.3 -Os. > >>> > >>> I am happy that -O2 performance is actually pretty good, but -Os > >>> regression is going to hurt on mobile. > >>> > >> Did you try gcc-4.5 -flto -Os or gcc-4.5 -flto -O2? > >> > >> It would be interesting to hear that GCC is able to LTO a program as big > >> as Mozilla! And figures (notably RAM, CPU time, wallclock time for > >> build) would be interesting. > >> > > > > Both whopr and flto cause gcc to segfault while building Mozilla. > > 4.5 WHOPR is completely broken. LTO is in better shape but I am not sure if > we > can resonably expect it to build mozilla. However I would be very happy to > help > getting WHOPR working for 4.6. Hi, I now got the 4.6 WHOPR build up to libxul.so that seems to be one of bigger files.
WHOPR linking consists of serial stage (WPA) merging whole program and doing interprocedural optimization followed by parallel build. The serial stage needs 3.7GB of RAM, 10 minutes, most of it is spent by writting out the files for parallel builds that are around 5GB overall. The size of files can be significantly cut down by sane partitioning algorithm, since we produce over 1000 partitions where 40 would do the job. (this is with enable-checking compiler) Later build still die for me, but it seems that libxul is not too large for WHOPR. (I hope all parameters to reduce significantly before 4.6 is out) What are the other big components I should be affraid of? Oprofile of WPA stage is as follows: 382507 8.4240 lto_output_1_stream 379158 8.3503 htab_find_slot_with_hash 207330 4.5661 bp_pack_value 155793 3.4311 iterative_hash_hashval_t 135132 2.9760 lto_output_uleb128_stream 101110 2.2268 gimple_types_compatible_p 92828 2.0444 cgraph_node_in_set_p 83205 1.8324 lto_promote_cross_file_statics 76243 1.6791 htab_expand 75993 1.6736 htab_hash_string 75790 1.6691 eq_string_slot_node 75020 1.6522 bp_unpack_value 73403 1.6166 linemap_lookup 65353 1.4393 lto_output_sleb128_stream 64864 1.4285 inflate_fast 64508 1.4207 verify_cgraph_node 60076 1.3231 lto_output_tree 57120 1.2580 referenced_from_this_partition_p 56225 1.2383 lto_input_uleb128 53620 1.1809 lto_streamer_cache_insert_1 52973 1.1666 htab_find_slot 45728 1.0071 lto_output_tree_or_ref 43428 0.9564 lto_input_1_unsigned 41556 0.9152 tree_map_base_eq 39232 0.8640 hash_cgraph_node_set_element 35695 0.7861 ggc_set_mark So not much of surprise - streaming is ineffecient and we need a lot of time for type merging too. I am compiling to get time report. Honza