Hi, I have discovered that scheduling pass_cprop_hardreg before pass_thread_prologue_and_epilogue leads to significant increases in numbers of performed shrink-wrappings. For one it solves PR 10474 (at least on x86_64-linux) but it also boosts the number of shrink-wrappings performed during gcc bootstrap by nearly 80% (3165->5692 functions). It is also necessary (although not sufficient) to perform shrink-wrapping in at least one function in the povray benchmark.
The reason why it helps so much is that before register allocation there are instructions moving the value of actual arguments from "originally hard" register (e.g. SI, DI, etc.) to a pseudo at the beginning of each function. When the argument is live across a function call, the pseudo is likely to be assigned to a callee-saved register and then also accessed from that register, even in the first BB, making it require prologue, though it could be fetched from the original one. When we convert all uses (at least in the first BB) to the original register, the preparatory stage of shrink wrapping is often capable of moving the register moves to a later BB, thus creating fast paths which do not require prologue and epilogue. We believe this change in the pipeline should not bring about any negative effects. During gcc bootstrap, the number of instructions changed by pass_cprop_hardreg dropped but by only 1.2%. We have also ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all run time differences could be attributed to noise. The changes in binary sizes were also small: | | Trunk produced | New | | | Benchmark | binary size | binary size | % diff | |----------------+----------------+-------------+--------| | 400.perlbench | 6219603 | 6136803 | -1.33 | | 401.bzip2 | 359291 | 351659 | -2.12 | | 403.gcc | 16249718 | 15915774 | -2.06 | | 410.bwaves | 145249 | 145769 | 0.36 | | 416.gamess | 40269686 | 40270270 | 0.00 | | 429.mcf | 97142 | 97126 | -0.02 | | 433.milc | 715444 | 713236 | -0.31 | | 434.zeusmp | 1444596 | 1444676 | 0.01 | | 435.gromacs | 6609207 | 6470039 | -2.11 | | 436.cactusADM | 4571319 | 4532607 | -0.85 | | 437.leslie3d | 492197 | 492357 | 0.03 | | 444.namd | 1001921 | 1007001 | 0.51 | | 445.gobmk | 8193495 | 8163839 | -0.36 | | 450.soplex | 5565070 | 5530734 | -0.62 | | 453.povray | 7468446 | 7340142 | -1.72 | | 454.calculix | 8474754 | 8464954 | -0.12 | | 456.hmmer | 1662315 | 1650147 | -0.73 | | 458.sjeng | 623065 | 620817 | -0.36 | | 459.GemsFDTD | 1456669 | 1461573 | 0.34 | | 462.libquantum | 249809 | 248401 | -0.56 | | 464.h264ref | 2784806 | 2772806 | -0.43 | | 465.tonto | 15511395 | 15480899 | -0.20 | | 470.lbm | 64327 | 64215 | -0.17 | | 471.omnetpp | 5325418 | 5293874 | -0.59 | | 473.astar | 365853 | 363261 | -0.71 | | 481.wrf | 22002287 | 21950783 | -0.23 | | 482.sphinx3 | 1153616 | 1145248 | -0.73 | | 483.xalancbmk | 62458676 | 62001540 | -0.73 | |----------------+----------------+-------------+--------| | TOTAL | 221535374 | 220130550 | -0.63 | I have successfully bootstrapped and tested the patch on x86-64-linux. Is it OK for trunk? Or should I also examine some other aspect? Thanks, Martin 2013-03-28 Martin Jambor <mjam...@suse.cz> PR middle-end/10474 * passes.c (init_optimization_passes): Move pass_cprop_hardreg before pass_thread_prologue_and_epilogue. testsuite/ * gcc.dg/pr10474.c: New test. Index: src/gcc/passes.c =================================================================== --- src.orig/gcc/passes.c +++ src/gcc/passes.c @@ -1630,6 +1630,7 @@ init_optimization_passes (void) NEXT_PASS (pass_ree); NEXT_PASS (pass_compare_elim_after_reload); NEXT_PASS (pass_branch_target_load_optimize1); + NEXT_PASS (pass_cprop_hardreg); NEXT_PASS (pass_thread_prologue_and_epilogue); NEXT_PASS (pass_rtl_dse2); NEXT_PASS (pass_stack_adjustments); @@ -1637,7 +1638,6 @@ init_optimization_passes (void) NEXT_PASS (pass_peephole2); NEXT_PASS (pass_if_after_reload); NEXT_PASS (pass_regrename); - NEXT_PASS (pass_cprop_hardreg); NEXT_PASS (pass_fast_rtl_dce); NEXT_PASS (pass_reorder_blocks); NEXT_PASS (pass_branch_target_load_optimize2); Index: src/gcc/testsuite/gcc.dg/pr10474.c =================================================================== --- /dev/null +++ src/gcc/testsuite/gcc.dg/pr10474.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-rtl-pro_and_epilogue" } */ + +void f(int *i) +{ + if (!i) + return; + else + { + __builtin_printf("Hi"); + *i=0; + } +} + +/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue" } } */ +/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */