On 23 May 2018 at 13:58, Richard Biener <rguent...@suse.de> wrote: > On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > >> Hi, >> I am trying to work on PR80155, which exposes a problem with code >> hoisting and register pressure on a leading embedded benchmark for ARM >> cortex-m7, where code-hoisting causes an extra register spill. >> >> I have attached two test-cases which (hopefully) are representative of >> the original test-case. >> The first one (trans_dfa.c) is bigger and somewhat similar to the >> original test-case and trans_dfa_2.c is hand-reduced version of >> trans_dfa.c. There's 2 spills caused with trans_dfa.c >> and one spill with trans_dfa_2.c due to lesser amount of cases. >> The test-cases in the PR are probably not relevant. >> >> Initially I thought the spill was happening because of "too many >> hoistings" taking place in original test-case thus increasing the >> register pressure, but it seems the spill is possibly caused because >> expression gets hoisted out of a block that is on loop exit. >> >> For example, the following hoistings take place with trans_dfa_2.c: >> >> (1) Inserting expression in block 4 for code hoisting: >> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >> >> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} >> (0006) >> >> (3) Inserting expression in block 4 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> (4) Inserting expression in block 3 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> The issue seems to be hoisting of (*tab + 1) which consists of first >> two hoistings in block 4 >> from blocks 5 and 9, which causes the extra spill. I verified that by >> disabling hoisting into block 4, >> which resulted in no extra spills. >> >> I wonder if that's because the expression (*tab + 1) is getting >> hoisted from blocks 5 and 9, >> which are on loop exit ? So the expression that was previously >> computed in a block on loop exit, gets hoisted outside that block >> which possibly makes the allocator more defensive ? Similarly >> disabling hoisting of expressions which appeared in blocks on loop >> exit in original test-case prevented the extra spill. The other >> hoistings didn't seem to matter. > > I think that's simply co-incidence. The only thing that makes > a block that also exits from the loop special is that an > expression could be sunk out of the loop and hoisting (commoning > with another path) could prevent that. But that isn't what is > happening here and it would be a pass ordering issue as > the sinking pass runs only after hoisting (no idea why exactly > but I guess there are cases where we want to prefer CSE over > sinking). So you could try if re-ordering PRE and sinking helps > your testcase. Thanks for the suggestions. Placing sink pass before PRE works for both these test-cases! Sadly it still causes the spill for the benchmark -:( I will try to create a better approximation of the original test-case. > > What I do see is a missed opportunity to merge the successors > of BB 4. After PRE we have > > <bb 4> [local count: 159303558]: > <L1>: > pretmp_123 = *tab_37(D); > _87 = pretmp_123 + 1; > if (c_36 == 65) > goto <bb 5>; [34.00%] > else > goto <bb 8>; [66.00%] > > <bb 5> [local count: 54163210]: > *tab_37(D) = _87; > _96 = MEM[(char *)s_57 + 1B]; > if (_96 != 0) > goto <bb 7>; [89.00%] > else > goto <bb 6>; [11.00%] > > <bb 8> [local count: 105140348]: > *tab_37(D) = _87; > _56 = MEM[(char *)s_57 + 1B]; > if (_56 != 0) > goto <bb 10>; [89.00%] > else > goto <bb 9>; [11.00%] > > here at least the stores and loads can be hoisted. Note this > may also point at the real issue of the code hoisting which is > tearing apart the RMW operation? Indeed, this possibility seems much more likely than block being on loop exit. I will try to "hardcode" the load/store hoists into block 4 for this specific test-case to check if that prevents the spill. > >> If that's the case, would it make sense to add another constraint to >> hoisting to not hoist expression if it appears in a block that is on >> loop exit (exception is if block has only single successor), at-least >> for targets like cortex-m7 where the effect of spill is more >> pronounced ? >> >> I tried to write an untested prototype patch (approach-8.diff) on >> these lines, to refuse hoisting if an expression appears in block on >> loop exit. The patch adds a new map pre_expr_block_d, that maps >> pre_expr to the set of blocks (as a bitmap) it appears in and are on >> loop exit, which is computed during compute_avail. >> During do_hoist_insertion, it simply checks if the bitmap of blocks is >> not empty. >> It works for the attached test-cases and passes ssa-pre-*.c and >> ssa-hoist-*.c tests. > > As said to me the heuristic doesn't make much sense. There is > btw loop_exits_from_bb_p (). > > If the issue in the end is a RA one (that is, there _is_ a better > allocation possible?) then you may also want to look at out-of-SSA > coalescing. > > So overall I'm not convinced the hoisting decision is wrong. > It doesn't increase register pressure at all. It does expose > a missed store hoisting and load CSE opportunity (but we don't > have a way to "PRE" or hoist stores at the moment). Stores > do not fit data flow problems well given they need to be kept > in order with respect to other stores and loads and appearantly > *tab aliases *s (yeah, s is char *... make tab restrict and I > guess things will work much smoother). Well strangely, making tab restrict resulted in two extra spills compared to without hoisting.
Thanks, Prathamesh > > Richard. > >> Alternatively, we could restrict replacing expression by it's leader >> in eliminate_dom_walker::before_dom_children if the expression appears >> in a block on loop exit. >> In principle this is more general than hoisting, but I have restricted >> it to only hoisted expressions to avoid being overly conservative. >> Similarly, this constrained could be made conditional, only for >> targets like cortex-m7. I have attached a prototype patch based on >> this approach (approach-9.diff). Although it works for attached >> test-cases, unfortunately it ICE's with the original test-case in >> tail-merge, I am working on that. >> >> I am not sure though if either of these approaches are in the correct >> direction and would >> be grateful for suggestions on the issue! >> >> Thanks, >> Prathamesh >> > > -- > Richard Biener <rguent...@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)