8 Regression] memory hog

rguenther at suse dot de Fri, 05 Oct 2018 01:02:44 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155


--- Comment #36 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 4 Oct 2018, rogerio.souza at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155
> 
> --- Comment #35 from Rogério de Souza Moraes <rogerio.souza at gmail dot com> 
> ---
> Created attachment 44791
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44791&action=edit
> Small testcase more similar to original environment
> 
> Hi Richard,
> 
> this is a new testcase, based on another file in the original environment. 
> It’s
> quite small (7000 lines, 240 setjmp calls).
> 
> This code with a little complex but still simplified control structure
> represents state machine implementation, which is very widely used by our
> customers. Another new factor is the nested setjmp calls. Of course, original
> testcase is more complex and takes even more time with more difference.
> 
> You can run it using the following commands:
> 
> 
> time gcc -DGCC -DLINUX_C -D_GLIBCXX_USE_CXX11_ABI=0  -m32 -m32 -w -c -O0
> -pedantic -fwrapv -mstackrealign -mpreferred-stack-boundary=4
> gcc_2nd_synth_pure_c_item.c -o gcc_2nd_synth_pure_c_item.o
> 
> time gcc -DGCC -DLINUX_C -D_GLIBCXX_USE_CXX11_ABI=0  -m32 -m32 -w -c -O
> -pedantic -fwrapv -mstackrealign -mpreferred-stack-boundary=4
> gcc_2nd_synth_pure_c_item.c -o gcc_2nd_synth_pure_c_item.o
> 
> 
> Results :
> 
> GCC: 4.8.5 (From RHEL 7.5)
> 
> real    0m0.349s
> user    0m0.255s
> sys     0m0.083s
> 
> real    0m0.193s
> user    0m0.163s
> sys     0m0.023s
> 
> GCC: 6.3.0 (GCC 6.3.0 with Revision 264523 backported and applied to it)
> 
> real    0m32.235s
> user    0m30.486s
> sys     0m1.622s
> 
> real    3m34.203s
> user    3m33.726s
> sys     0m0.292s
> 
> The performance difference is relevant in this test.

Thanks for the more realistic testcase.  I can confirm the above
and I also see a slowdown in GCC 9 compared to GCC 8 at -O1:

> /usr//bin/time gcc-8 -S t.c -O -fwrapv -mstackrealign 
-mpreferred-stack-boundary=4 -m32
157.48user 0.24system 2:37.78elapsed 99%CPU (0avgtext+0avgdata 
888036maxresident)k
47704inputs+152outputs (8major+240936minor)pagefaults 0swaps

> /usr//bin/time gcc-9 -S t.c -O -fwrapv -mstackrealign 
-mpreferred-stack-boundary=4 -m32
197.61user 0.39system 3:18.08elapsed 99%CPU (0avgtext+0avgdata 
890628maxresident)k
0inputs+184outputs (0major+259016minor)pagefaults 0swaps

Somehow it's still CCP that makes things slow:

 tree CCP                           : 178.52 ( 89%)   0.01 (  2%) 178.55 ( 
89%)     646 kB (  0%)

perf tells me it's

-   96.33%    29.55%         14801  cc1      cc1               [.] 
ccp_propagate::visit_phi                                            ▒
     ccp_propagate::visit_phi                                                   
▒
   - ssa_propagation_engine::simulate_stmt                                      
▒
      + 49.51% ssa_propagation_engine::simulate_block                           
▒
      + 46.82% ssa_propagation_engine::ssa_propagate                            

-   37.06%    28.98%         12421  cc1      cc1               [.] 
ccp_lattice_meet                                                    ▒
   - ccp_lattice_meet                                                           
▒
      + 37.02% ccp_propagate::visit_phi                                         
▒
      + 0.03% set_lattice_value                  

-    5.17%     5.17%          1949  cc1      cc1               [.] 
wi::bit_or<generic_wide_int<fixed_wide_int_storage<192> >, generic_w▒
     wi::bit_or<generic_wide_int<fixed_wide_int_storage<192> >, 
generic_wide_int<fixed_wide_int_storage<192> > >                       ▒
   - ccp_lattice_meet                                                           
▒
      + 5.16% ccp_propagate::visit_phi                                          
▒
      + 0.01% set_lattice_value                                                 

-    4.02%     4.02%          1509  cc1      cc1               [.] 
canonicalize_value                                                  ▒
   - canonicalize_value                                                         
▒
      + 4.02% get_value_for_expr                                                
▒
      + 0.00% ccp_folder::get_value                  

-    2.90%     2.89%          1083  cc1      cc1               [.] 
wi::eq_p<generic_wide_int<fixed_wide_int_storage<192> >, int>       ▒
     wi::eq_p<generic_wide_int<fixed_wide_int_storage<192> >, int>              
▒
   - ccp_lattice_meet                                                           
▒
      + 2.89% ccp_propagate::visit_phi                                          
▒
      + 0.00% set_lattice_value                   

As said, thanks for the testcase.

[Bug middle-end/63155] [6/7/8 Regression] memory hog

Reply via email to