https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155
--- Comment #36 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 4 Oct 2018, rogerio.souza at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155 > > --- Comment #35 from Rogério de Souza Moraes <rogerio.souza at gmail dot com> > --- > Created attachment 44791 > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44791&action=edit > Small testcase more similar to original environment > > Hi Richard, > > this is a new testcase, based on another file in the original environment. > It’s > quite small (7000 lines, 240 setjmp calls). > > This code with a little complex but still simplified control structure > represents state machine implementation, which is very widely used by our > customers. Another new factor is the nested setjmp calls. Of course, original > testcase is more complex and takes even more time with more difference. > > You can run it using the following commands: > > > time gcc -DGCC -DLINUX_C -D_GLIBCXX_USE_CXX11_ABI=0 -m32 -m32 -w -c -O0 > -pedantic -fwrapv -mstackrealign -mpreferred-stack-boundary=4 > gcc_2nd_synth_pure_c_item.c -o gcc_2nd_synth_pure_c_item.o > > time gcc -DGCC -DLINUX_C -D_GLIBCXX_USE_CXX11_ABI=0 -m32 -m32 -w -c -O > -pedantic -fwrapv -mstackrealign -mpreferred-stack-boundary=4 > gcc_2nd_synth_pure_c_item.c -o gcc_2nd_synth_pure_c_item.o > > > Results : > > GCC: 4.8.5 (From RHEL 7.5) > > real 0m0.349s > user 0m0.255s > sys 0m0.083s > > real 0m0.193s > user 0m0.163s > sys 0m0.023s > > GCC: 6.3.0 (GCC 6.3.0 with Revision 264523 backported and applied to it) > > real 0m32.235s > user 0m30.486s > sys 0m1.622s > > real 3m34.203s > user 3m33.726s > sys 0m0.292s > > The performance difference is relevant in this test. Thanks for the more realistic testcase. I can confirm the above and I also see a slowdown in GCC 9 compared to GCC 8 at -O1: > /usr//bin/time gcc-8 -S t.c -O -fwrapv -mstackrealign -mpreferred-stack-boundary=4 -m32 157.48user 0.24system 2:37.78elapsed 99%CPU (0avgtext+0avgdata 888036maxresident)k 47704inputs+152outputs (8major+240936minor)pagefaults 0swaps > /usr//bin/time gcc-9 -S t.c -O -fwrapv -mstackrealign -mpreferred-stack-boundary=4 -m32 197.61user 0.39system 3:18.08elapsed 99%CPU (0avgtext+0avgdata 890628maxresident)k 0inputs+184outputs (0major+259016minor)pagefaults 0swaps Somehow it's still CCP that makes things slow: tree CCP : 178.52 ( 89%) 0.01 ( 2%) 178.55 ( 89%) 646 kB ( 0%) perf tells me it's - 96.33% 29.55% 14801 cc1 cc1 [.] ccp_propagate::visit_phi ▒ ccp_propagate::visit_phi ▒ - ssa_propagation_engine::simulate_stmt ▒ + 49.51% ssa_propagation_engine::simulate_block ▒ + 46.82% ssa_propagation_engine::ssa_propagate - 37.06% 28.98% 12421 cc1 cc1 [.] ccp_lattice_meet ▒ - ccp_lattice_meet ▒ + 37.02% ccp_propagate::visit_phi ▒ + 0.03% set_lattice_value - 5.17% 5.17% 1949 cc1 cc1 [.] wi::bit_or<generic_wide_int<fixed_wide_int_storage<192> >, generic_w▒ wi::bit_or<generic_wide_int<fixed_wide_int_storage<192> >, generic_wide_int<fixed_wide_int_storage<192> > > ▒ - ccp_lattice_meet ▒ + 5.16% ccp_propagate::visit_phi ▒ + 0.01% set_lattice_value - 4.02% 4.02% 1509 cc1 cc1 [.] canonicalize_value ▒ - canonicalize_value ▒ + 4.02% get_value_for_expr ▒ + 0.00% ccp_folder::get_value - 2.90% 2.89% 1083 cc1 cc1 [.] wi::eq_p<generic_wide_int<fixed_wide_int_storage<192> >, int> ▒ wi::eq_p<generic_wide_int<fixed_wide_int_storage<192> >, int> ▒ - ccp_lattice_meet ▒ + 2.89% ccp_propagate::visit_phi ▒ + 0.00% set_lattice_value As said, thanks for the testcase.