On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: > On 23 May 2018 at 13:58, Richard Biener <rguent...@suse.de> wrote: >> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >> >>> Hi, >>> I am trying to work on PR80155, which exposes a problem with code >>> hoisting and register pressure on a leading embedded benchmark for ARM >>> cortex-m7, where code-hoisting causes an extra register spill. >>> >>> I have attached two test-cases which (hopefully) are representative of >>> the original test-case. >>> The first one (trans_dfa.c) is bigger and somewhat similar to the >>> original test-case and trans_dfa_2.c is hand-reduced version of >>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>> The test-cases in the PR are probably not relevant. >>> >>> Initially I thought the spill was happening because of "too many >>> hoistings" taking place in original test-case thus increasing the >>> register pressure, but it seems the spill is possibly caused because >>> expression gets hoisted out of a block that is on loop exit. >>> >>> For example, the following hoistings take place with trans_dfa_2.c: >>> >>> (1) Inserting expression in block 4 for code hoisting: >>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>> >>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} >>> (0006) >>> >>> (3) Inserting expression in block 4 for code hoisting: >>> {pointer_plus_expr,s_33,1} (0023) >>> >>> (4) Inserting expression in block 3 for code hoisting: >>> {pointer_plus_expr,s_33,1} (0023) >>> >>> The issue seems to be hoisting of (*tab + 1) which consists of first >>> two hoistings in block 4 >>> from blocks 5 and 9, which causes the extra spill. I verified that by >>> disabling hoisting into block 4, >>> which resulted in no extra spills. >>> >>> I wonder if that's because the expression (*tab + 1) is getting >>> hoisted from blocks 5 and 9, >>> which are on loop exit ? So the expression that was previously >>> computed in a block on loop exit, gets hoisted outside that block >>> which possibly makes the allocator more defensive ? Similarly >>> disabling hoisting of expressions which appeared in blocks on loop >>> exit in original test-case prevented the extra spill. The other >>> hoistings didn't seem to matter. >> >> I think that's simply co-incidence. The only thing that makes >> a block that also exits from the loop special is that an >> expression could be sunk out of the loop and hoisting (commoning >> with another path) could prevent that. But that isn't what is >> happening here and it would be a pass ordering issue as >> the sinking pass runs only after hoisting (no idea why exactly >> but I guess there are cases where we want to prefer CSE over >> sinking). So you could try if re-ordering PRE and sinking helps >> your testcase. > Thanks for the suggestions. Placing sink pass before PRE works > for both these test-cases! Sadly it still causes the spill for the benchmark > -:( > I will try to create a better approximation of the original test-case. >> >> What I do see is a missed opportunity to merge the successors >> of BB 4. After PRE we have >> >> <bb 4> [local count: 159303558]: >> <L1>: >> pretmp_123 = *tab_37(D); >> _87 = pretmp_123 + 1; >> if (c_36 == 65) >> goto <bb 5>; [34.00%] >> else >> goto <bb 8>; [66.00%] >> >> <bb 5> [local count: 54163210]: >> *tab_37(D) = _87; >> _96 = MEM[(char *)s_57 + 1B]; >> if (_96 != 0) >> goto <bb 7>; [89.00%] >> else >> goto <bb 6>; [11.00%] >> >> <bb 8> [local count: 105140348]: >> *tab_37(D) = _87; >> _56 = MEM[(char *)s_57 + 1B]; >> if (_56 != 0) >> goto <bb 10>; [89.00%] >> else >> goto <bb 9>; [11.00%] >> >> here at least the stores and loads can be hoisted. Note this >> may also point at the real issue of the code hoisting which is >> tearing apart the RMW operation? > Indeed, this possibility seems much more likely than block being on loop exit. > I will try to "hardcode" the load/store hoists into block 4 for this > specific test-case to check > if that prevents the spill. Even if it prevents the spill in this case, it's likely a good thing to do. The statements prior to the conditional in bb5 and bb8 should be hoisted, leaving bb5 and bb8 with just their conditionals.
Jeff