On Tue, Jan 28, 2014 at 07:26:47AM +0100, Jakub Jelinek wrote: > > I wonder if this is just some of --enable-checking tests in dwarf2out going > > wild > > or if it is just expensive sanity checking code? > > I used to have chroot environment for 32bit builds, I will need to > > re-install it now. > > variable tracking :2914.85 (83%) usr 1.88 ( 7%) sys2931.22 (82%) > wall 80844 kB ( 3%) ggc > var-tracking dataflow : 18.19 ( 1%) usr 0.19 ( 1%) sys 18.49 ( 1%) > wall 10899 kB ( 0%) ggc > var-tracking emit : 29.41 ( 1%) usr 0.11 ( 0%) sys 29.65 ( 1%) > wall 148128 kB ( 6%) ggc > TOTAL :3525.97 25.73 3570.33 > 2321043 kB > > So, strangely both vt_find_locations and vt_emit_notes, typically the most > expensive ones, > are quite unexpensive and most of the time is spent elsewhere, in > vt_initialize?
So, most of the time seems to be spent in cselib.c remove_useless_values (both from Ctrl-C in gdb profiling, and callgrind). For callgrind I've actually built 64-bit cc1plus with --enable-checking=release, and still compiled the same --enable-checking=yes,rtl -m32 -O2 -g insn-recog.c, the build then took just 14 minutes instead of 60 minutes, and in that case only about 30% of compile time has been spent in var-tracking and 20% of compile time in remove_useless_values in particular. The problem with remove_useless_values is that we have quickly very big cselib hash table (over 200000 entries) and very large number of very small basic blocks (over 34000) and at the end of every basic block we call cselib_preserve_only_values which walks the whole hash table, and e.g. references_value_p is called 845114869x from discard_useless_locs. A micro-optimization could be e.g. to turn references_value_p into a template where only_useless would be a template parameter rather than actual parameter (due to recursion inlining doesn't help here) or just two functions. Also, for RTL checking, I wonder if the functions like reference_values_p and similar ones that use GET_RTX_FORMAT/GET_RTX_LENGTH to walk the elements couldn't use special non-checking macros in doing so if the compiler can't figure out checking is redundant there because it is being performed by hand by the function (haven't verified). And, perhaps also an approach similar to http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html for GET_RTX_FORMAT and/or GET_RTX_LENGTH (so that at least for the cases where the compiler knows which rtx code it is (if some code is guarded with specific GET_CODE () == test), it could avoid loading from the const arrays). Anyway, I guess more important is whether all the values in the cselib_hash_table will be ever useful for some lookup, or if there are e.g. values that only reference preserved values where none of those referenced values have any locations other than preserved values. Or if we can somehow quickly find out what VALUEs have changed during processing of the last bb and only process that subset instead of all the hash table entries all the time. Alex? Your thoughts? Jakub