Re: PR80155: Code hoisting and register pressure
On 23 May 2018 at 18:37, Jeff Law wrote: > On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >> On 23 May 2018 at 13:58, Richard Biener wrote: >>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>> Hi, I am trying to work on PR80155, which exposes a problem with code hoisting and register pressure on a leading embedded benchmark for ARM cortex-m7, where code-hoisting causes an extra register spill. I have attached two test-cases which (hopefully) are representative of the original test-case. The first one (trans_dfa.c) is bigger and somewhat similar to the original test-case and trans_dfa_2.c is hand-reduced version of trans_dfa.c. There's 2 spills caused with trans_dfa.c and one spill with trans_dfa_2.c due to lesser amount of cases. The test-cases in the PR are probably not relevant. Initially I thought the spill was happening because of "too many hoistings" taking place in original test-case thus increasing the register pressure, but it seems the spill is possibly caused because expression gets hoisted out of a block that is on loop exit. For example, the following hoistings take place with trans_dfa_2.c: (1) Inserting expression in block 4 for code hoisting: {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) (3) Inserting expression in block 4 for code hoisting: {pointer_plus_expr,s_33,1} (0023) (4) Inserting expression in block 3 for code hoisting: {pointer_plus_expr,s_33,1} (0023) The issue seems to be hoisting of (*tab + 1) which consists of first two hoistings in block 4 from blocks 5 and 9, which causes the extra spill. I verified that by disabling hoisting into block 4, which resulted in no extra spills. I wonder if that's because the expression (*tab + 1) is getting hoisted from blocks 5 and 9, which are on loop exit ? So the expression that was previously computed in a block on loop exit, gets hoisted outside that block which possibly makes the allocator more defensive ? Similarly disabling hoisting of expressions which appeared in blocks on loop exit in original test-case prevented the extra spill. The other hoistings didn't seem to matter. >>> >>> I think that's simply co-incidence. The only thing that makes >>> a block that also exits from the loop special is that an >>> expression could be sunk out of the loop and hoisting (commoning >>> with another path) could prevent that. But that isn't what is >>> happening here and it would be a pass ordering issue as >>> the sinking pass runs only after hoisting (no idea why exactly >>> but I guess there are cases where we want to prefer CSE over >>> sinking). So you could try if re-ordering PRE and sinking helps >>> your testcase. >> Thanks for the suggestions. Placing sink pass before PRE works >> for both these test-cases! Sadly it still causes the spill for the benchmark >> -:( >> I will try to create a better approximation of the original test-case. >>> >>> What I do see is a missed opportunity to merge the successors >>> of BB 4. After PRE we have >>> >>> [local count: 159303558]: >>> : >>> pretmp_123 = *tab_37(D); >>> _87 = pretmp_123 + 1; >>> if (c_36 == 65) >>> goto ; [34.00%] >>> else >>> goto ; [66.00%] >>> >>> [local count: 54163210]: >>> *tab_37(D) = _87; >>> _96 = MEM[(char *)s_57 + 1B]; >>> if (_96 != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>> [local count: 105140348]: >>> *tab_37(D) = _87; >>> _56 = MEM[(char *)s_57 + 1B]; >>> if (_56 != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>> here at least the stores and loads can be hoisted. Note this >>> may also point at the real issue of the code hoisting which is >>> tearing apart the RMW operation? >> Indeed, this possibility seems much more likely than block being on loop >> exit. >> I will try to "hardcode" the load/store hoists into block 4 for this >> specific test-case to check >> if that prevents the spill. > Even if it prevents the spill in this case, it's likely a good thing to > do. The statements prior to the conditional in bb5 and bb8 should be > hoisted, leaving bb5 and bb8 with just their conditionals. Hi, It seems disabling forwprop somehow works for causing no extra spills on the original test-case. For instance, Hoisting without forwprop: bb 3: _1 = tab_1(D) + 8 pretmp_268 = MEM[tab_1(D) + 8B]; _2 = pretmp_268 + 1; goto or bb 4: *_1 = _ 2 bb 5: *_1 = _2 Hoisting with forwprop: bb 3: pretmp_164 = MEM[tab_1(D) + 8B]; _2 = pretmp_164 + 1 goto or bb 4: MEM[tab_1(D) + 8] = _2; bb 5: MEM[tab_1(D) + 8] = _2; Although in both cases, we aren't hoisting stores, the issues with forwprop for this case seems to be the folding of *_1 = _2 into MEM[tab_1(D) + 8] = _2 ? Disabling folding to mem_ref[base + offset] in forwpro
Re: PR80155: Code hoisting and register pressure
On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni wrote: > On 23 May 2018 at 18:37, Jeff Law wrote: >> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>> On 23 May 2018 at 13:58, Richard Biener wrote: On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > Hi, > I am trying to work on PR80155, which exposes a problem with code > hoisting and register pressure on a leading embedded benchmark for ARM > cortex-m7, where code-hoisting causes an extra register spill. > > I have attached two test-cases which (hopefully) are representative of > the original test-case. > The first one (trans_dfa.c) is bigger and somewhat similar to the > original test-case and trans_dfa_2.c is hand-reduced version of > trans_dfa.c. There's 2 spills caused with trans_dfa.c > and one spill with trans_dfa_2.c due to lesser amount of cases. > The test-cases in the PR are probably not relevant. > > Initially I thought the spill was happening because of "too many > hoistings" taking place in original test-case thus increasing the > register pressure, but it seems the spill is possibly caused because > expression gets hoisted out of a block that is on loop exit. > > For example, the following hoistings take place with trans_dfa_2.c: > > (1) Inserting expression in block 4 for code hoisting: > {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) > > (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} > (0006) > > (3) Inserting expression in block 4 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > (4) Inserting expression in block 3 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > The issue seems to be hoisting of (*tab + 1) which consists of first > two hoistings in block 4 > from blocks 5 and 9, which causes the extra spill. I verified that by > disabling hoisting into block 4, > which resulted in no extra spills. > > I wonder if that's because the expression (*tab + 1) is getting > hoisted from blocks 5 and 9, > which are on loop exit ? So the expression that was previously > computed in a block on loop exit, gets hoisted outside that block > which possibly makes the allocator more defensive ? Similarly > disabling hoisting of expressions which appeared in blocks on loop > exit in original test-case prevented the extra spill. The other > hoistings didn't seem to matter. I think that's simply co-incidence. The only thing that makes a block that also exits from the loop special is that an expression could be sunk out of the loop and hoisting (commoning with another path) could prevent that. But that isn't what is happening here and it would be a pass ordering issue as the sinking pass runs only after hoisting (no idea why exactly but I guess there are cases where we want to prefer CSE over sinking). So you could try if re-ordering PRE and sinking helps your testcase. >>> Thanks for the suggestions. Placing sink pass before PRE works >>> for both these test-cases! Sadly it still causes the spill for the >>> benchmark -:( >>> I will try to create a better approximation of the original test-case. What I do see is a missed opportunity to merge the successors of BB 4. After PRE we have [local count: 159303558]: : pretmp_123 = *tab_37(D); _87 = pretmp_123 + 1; if (c_36 == 65) goto ; [34.00%] else goto ; [66.00%] [local count: 54163210]: *tab_37(D) = _87; _96 = MEM[(char *)s_57 + 1B]; if (_96 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105140348]: *tab_37(D) = _87; _56 = MEM[(char *)s_57 + 1B]; if (_56 != 0) goto ; [89.00%] else goto ; [11.00%] here at least the stores and loads can be hoisted. Note this may also point at the real issue of the code hoisting which is tearing apart the RMW operation? >>> Indeed, this possibility seems much more likely than block being on loop >>> exit. >>> I will try to "hardcode" the load/store hoists into block 4 for this >>> specific test-case to check >>> if that prevents the spill. >> Even if it prevents the spill in this case, it's likely a good thing to >> do. The statements prior to the conditional in bb5 and bb8 should be >> hoisted, leaving bb5 and bb8 with just their conditionals. > Hi, > It seems disabling forwprop somehow works for causing no extra spills > on the original test-case. > > For instance, > Hoisting without forwprop: > > bb 3: > _1 = tab_1(D) + 8 > pretmp_268 = MEM[tab_1(D) + 8B]; > _2 = pretmp_268 + 1; > goto or > > bb 4: > *_1 = _ 2 > > bb 5: > *_1 = _2 > > Hoisting with forwprop: > > bb 3: > pretmp_164 = MEM[tab_1(D) + 8B]; > _2 = pretmp_164 + 1 > goto or > > bb 4: > MEM[tab_1(D) + 8] = _2; > > bb 5: > MEM[tab_1(D) +
Re: PR80155: Code hoisting and register pressure
On Fri, 25 May 2018, Bin.Cheng wrote: > On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni > wrote: > > On 23 May 2018 at 18:37, Jeff Law wrote: > >> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: > >>> On 23 May 2018 at 13:58, Richard Biener wrote: > On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > > > Hi, > > I am trying to work on PR80155, which exposes a problem with code > > hoisting and register pressure on a leading embedded benchmark for ARM > > cortex-m7, where code-hoisting causes an extra register spill. > > > > I have attached two test-cases which (hopefully) are representative of > > the original test-case. > > The first one (trans_dfa.c) is bigger and somewhat similar to the > > original test-case and trans_dfa_2.c is hand-reduced version of > > trans_dfa.c. There's 2 spills caused with trans_dfa.c > > and one spill with trans_dfa_2.c due to lesser amount of cases. > > The test-cases in the PR are probably not relevant. > > > > Initially I thought the spill was happening because of "too many > > hoistings" taking place in original test-case thus increasing the > > register pressure, but it seems the spill is possibly caused because > > expression gets hoisted out of a block that is on loop exit. > > > > For example, the following hoistings take place with trans_dfa_2.c: > > > > (1) Inserting expression in block 4 for code hoisting: > > {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) > > > > (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} > > (0006) > > > > (3) Inserting expression in block 4 for code hoisting: > > {pointer_plus_expr,s_33,1} (0023) > > > > (4) Inserting expression in block 3 for code hoisting: > > {pointer_plus_expr,s_33,1} (0023) > > > > The issue seems to be hoisting of (*tab + 1) which consists of first > > two hoistings in block 4 > > from blocks 5 and 9, which causes the extra spill. I verified that by > > disabling hoisting into block 4, > > which resulted in no extra spills. > > > > I wonder if that's because the expression (*tab + 1) is getting > > hoisted from blocks 5 and 9, > > which are on loop exit ? So the expression that was previously > > computed in a block on loop exit, gets hoisted outside that block > > which possibly makes the allocator more defensive ? Similarly > > disabling hoisting of expressions which appeared in blocks on loop > > exit in original test-case prevented the extra spill. The other > > hoistings didn't seem to matter. > > I think that's simply co-incidence. The only thing that makes > a block that also exits from the loop special is that an > expression could be sunk out of the loop and hoisting (commoning > with another path) could prevent that. But that isn't what is > happening here and it would be a pass ordering issue as > the sinking pass runs only after hoisting (no idea why exactly > but I guess there are cases where we want to prefer CSE over > sinking). So you could try if re-ordering PRE and sinking helps > your testcase. > >>> Thanks for the suggestions. Placing sink pass before PRE works > >>> for both these test-cases! Sadly it still causes the spill for the > >>> benchmark -:( > >>> I will try to create a better approximation of the original test-case. > > What I do see is a missed opportunity to merge the successors > of BB 4. After PRE we have > > [local count: 159303558]: > : > pretmp_123 = *tab_37(D); > _87 = pretmp_123 + 1; > if (c_36 == 65) > goto ; [34.00%] > else > goto ; [66.00%] > > [local count: 54163210]: > *tab_37(D) = _87; > _96 = MEM[(char *)s_57 + 1B]; > if (_96 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > [local count: 105140348]: > *tab_37(D) = _87; > _56 = MEM[(char *)s_57 + 1B]; > if (_56 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > here at least the stores and loads can be hoisted. Note this > may also point at the real issue of the code hoisting which is > tearing apart the RMW operation? > >>> Indeed, this possibility seems much more likely than block being on loop > >>> exit. > >>> I will try to "hardcode" the load/store hoists into block 4 for this > >>> specific test-case to check > >>> if that prevents the spill. > >> Even if it prevents the spill in this case, it's likely a good thing to > >> do. The statements prior to the conditional in bb5 and bb8 should be > >> hoisted, leaving bb5 and bb8 with just their conditionals. > > Hi, > > It seems disabling forwprop somehow works for causing no extra spills > > on the original test-case. > > > > For instance, > > Hoisting without forwprop: > > > > bb 3: > > _1 = tab_1(D) + 8 > > pretmp_26
Re: PR80155: Code hoisting and register pressure
On 05/25/2018 03:49 AM, Bin.Cheng wrote: > On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni > wrote: >> On 23 May 2018 at 18:37, Jeff Law wrote: >>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: On 23 May 2018 at 13:58, Richard Biener wrote: > On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > >> Hi, >> I am trying to work on PR80155, which exposes a problem with code >> hoisting and register pressure on a leading embedded benchmark for ARM >> cortex-m7, where code-hoisting causes an extra register spill. >> >> I have attached two test-cases which (hopefully) are representative of >> the original test-case. >> The first one (trans_dfa.c) is bigger and somewhat similar to the >> original test-case and trans_dfa_2.c is hand-reduced version of >> trans_dfa.c. There's 2 spills caused with trans_dfa.c >> and one spill with trans_dfa_2.c due to lesser amount of cases. >> The test-cases in the PR are probably not relevant. >> >> Initially I thought the spill was happening because of "too many >> hoistings" taking place in original test-case thus increasing the >> register pressure, but it seems the spill is possibly caused because >> expression gets hoisted out of a block that is on loop exit. >> >> For example, the following hoistings take place with trans_dfa_2.c: >> >> (1) Inserting expression in block 4 for code hoisting: >> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >> >> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} >> (0006) >> >> (3) Inserting expression in block 4 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> (4) Inserting expression in block 3 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> The issue seems to be hoisting of (*tab + 1) which consists of first >> two hoistings in block 4 >> from blocks 5 and 9, which causes the extra spill. I verified that by >> disabling hoisting into block 4, >> which resulted in no extra spills. >> >> I wonder if that's because the expression (*tab + 1) is getting >> hoisted from blocks 5 and 9, >> which are on loop exit ? So the expression that was previously >> computed in a block on loop exit, gets hoisted outside that block >> which possibly makes the allocator more defensive ? Similarly >> disabling hoisting of expressions which appeared in blocks on loop >> exit in original test-case prevented the extra spill. The other >> hoistings didn't seem to matter. > > I think that's simply co-incidence. The only thing that makes > a block that also exits from the loop special is that an > expression could be sunk out of the loop and hoisting (commoning > with another path) could prevent that. But that isn't what is > happening here and it would be a pass ordering issue as > the sinking pass runs only after hoisting (no idea why exactly > but I guess there are cases where we want to prefer CSE over > sinking). So you could try if re-ordering PRE and sinking helps > your testcase. Thanks for the suggestions. Placing sink pass before PRE works for both these test-cases! Sadly it still causes the spill for the benchmark -:( I will try to create a better approximation of the original test-case. > > What I do see is a missed opportunity to merge the successors > of BB 4. After PRE we have > > [local count: 159303558]: > : > pretmp_123 = *tab_37(D); > _87 = pretmp_123 + 1; > if (c_36 == 65) > goto ; [34.00%] > else > goto ; [66.00%] > > [local count: 54163210]: > *tab_37(D) = _87; > _96 = MEM[(char *)s_57 + 1B]; > if (_96 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > [local count: 105140348]: > *tab_37(D) = _87; > _56 = MEM[(char *)s_57 + 1B]; > if (_56 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > here at least the stores and loads can be hoisted. Note this > may also point at the real issue of the code hoisting which is > tearing apart the RMW operation? Indeed, this possibility seems much more likely than block being on loop exit. I will try to "hardcode" the load/store hoists into block 4 for this specific test-case to check if that prevents the spill. >>> Even if it prevents the spill in this case, it's likely a good thing to >>> do. The statements prior to the conditional in bb5 and bb8 should be >>> hoisted, leaving bb5 and bb8 with just their conditionals. >> Hi, >> It seems disabling forwprop somehow works for causing no extra spills >> on the original test-case. >> >> For instance, >> Hoisting without forwprop: >> >> bb 3: >> _1 = tab_1(D) + 8 >> pretmp_268 = MEM[tab_1(D) + 8B]; >> _2 = pretmp_268 + 1; >> goto or >> >> bb 4: >> *_1 = _ 2 >> >> bb 5: >> *_1 = _2 >>
Re: PR80155: Code hoisting and register pressure
On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law wrote: >On 05/25/2018 03:49 AM, Bin.Cheng wrote: >> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni >> wrote: >>> On 23 May 2018 at 18:37, Jeff Law wrote: On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: > On 23 May 2018 at 13:58, Richard Biener wrote: >> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >> >>> Hi, >>> I am trying to work on PR80155, which exposes a problem with >code >>> hoisting and register pressure on a leading embedded benchmark >for ARM >>> cortex-m7, where code-hoisting causes an extra register spill. >>> >>> I have attached two test-cases which (hopefully) are >representative of >>> the original test-case. >>> The first one (trans_dfa.c) is bigger and somewhat similar to >the >>> original test-case and trans_dfa_2.c is hand-reduced version of >>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>> The test-cases in the PR are probably not relevant. >>> >>> Initially I thought the spill was happening because of "too many >>> hoistings" taking place in original test-case thus increasing >the >>> register pressure, but it seems the spill is possibly caused >because >>> expression gets hoisted out of a block that is on loop exit. >>> >>> For example, the following hoistings take place with >trans_dfa_2.c: >>> >>> (1) Inserting expression in block 4 for code hoisting: >>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>> >>> (2) Inserting expression in block 4 for code hoisting: >{plus_expr,_4,1} (0006) >>> >>> (3) Inserting expression in block 4 for code hoisting: >>> {pointer_plus_expr,s_33,1} (0023) >>> >>> (4) Inserting expression in block 3 for code hoisting: >>> {pointer_plus_expr,s_33,1} (0023) >>> >>> The issue seems to be hoisting of (*tab + 1) which consists of >first >>> two hoistings in block 4 >>> from blocks 5 and 9, which causes the extra spill. I verified >that by >>> disabling hoisting into block 4, >>> which resulted in no extra spills. >>> >>> I wonder if that's because the expression (*tab + 1) is getting >>> hoisted from blocks 5 and 9, >>> which are on loop exit ? So the expression that was previously >>> computed in a block on loop exit, gets hoisted outside that >block >>> which possibly makes the allocator more defensive ? Similarly >>> disabling hoisting of expressions which appeared in blocks on >loop >>> exit in original test-case prevented the extra spill. The other >>> hoistings didn't seem to matter. >> >> I think that's simply co-incidence. The only thing that makes >> a block that also exits from the loop special is that an >> expression could be sunk out of the loop and hoisting (commoning >> with another path) could prevent that. But that isn't what is >> happening here and it would be a pass ordering issue as >> the sinking pass runs only after hoisting (no idea why exactly >> but I guess there are cases where we want to prefer CSE over >> sinking). So you could try if re-ordering PRE and sinking helps >> your testcase. > Thanks for the suggestions. Placing sink pass before PRE works > for both these test-cases! Sadly it still causes the spill for the >benchmark -:( > I will try to create a better approximation of the original >test-case. >> >> What I do see is a missed opportunity to merge the successors >> of BB 4. After PRE we have >> >> [local count: 159303558]: >> : >> pretmp_123 = *tab_37(D); >> _87 = pretmp_123 + 1; >> if (c_36 == 65) >> goto ; [34.00%] >> else >> goto ; [66.00%] >> >> [local count: 54163210]: >> *tab_37(D) = _87; >> _96 = MEM[(char *)s_57 + 1B]; >> if (_96 != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >> [local count: 105140348]: >> *tab_37(D) = _87; >> _56 = MEM[(char *)s_57 + 1B]; >> if (_56 != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >> here at least the stores and loads can be hoisted. Note this >> may also point at the real issue of the code hoisting which is >> tearing apart the RMW operation? > Indeed, this possibility seems much more likely than block being >on loop exit. > I will try to "hardcode" the load/store hoists into block 4 for >this > specific test-case to check > if that prevents the spill. Even if it prevents the spill in this case, it's likely a good >thing to do. The statements prior to the conditional in bb5 and bb8 should >be hoisted, leaving bb5 and bb8 with just their conditionals. >>> Hi, >>> It seems disabling forwprop somehow works for causing no extra >spills >>> on the original test-case. >>> >>> For instance, >>> Hoisting without for
not computable at load time
One of my testsuite failures for the pdp11 back end is gcc.c-torture/compile/930326-1.c which is: struct { char a, b, f[3]; } s; long i = s.f-&s.b; It fails with "error: initializer element is not computable at load time". I don't understand why because it seems to be a perfectly reasonable compile time constant; "load time" doesn't enter into the picture that I can see. If I replace "long" by "short" it works correctly. So presumably it has something to do with the fact that Pmode == HImode. But how that translates into this failure I don't know. paul
Re: PR80155: Code hoisting and register pressure
On 05/25/2018 11:54 AM, Richard Biener wrote: > On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law wrote: >> On 05/25/2018 03:49 AM, Bin.Cheng wrote: >>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni >>> wrote: On 23 May 2018 at 18:37, Jeff Law wrote: > On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >> On 23 May 2018 at 13:58, Richard Biener wrote: >>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>> Hi, I am trying to work on PR80155, which exposes a problem with >> code hoisting and register pressure on a leading embedded benchmark >> for ARM cortex-m7, where code-hoisting causes an extra register spill. I have attached two test-cases which (hopefully) are >> representative of the original test-case. The first one (trans_dfa.c) is bigger and somewhat similar to >> the original test-case and trans_dfa_2.c is hand-reduced version of trans_dfa.c. There's 2 spills caused with trans_dfa.c and one spill with trans_dfa_2.c due to lesser amount of cases. The test-cases in the PR are probably not relevant. Initially I thought the spill was happening because of "too many hoistings" taking place in original test-case thus increasing >> the register pressure, but it seems the spill is possibly caused >> because expression gets hoisted out of a block that is on loop exit. For example, the following hoistings take place with >> trans_dfa_2.c: (1) Inserting expression in block 4 for code hoisting: {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) (2) Inserting expression in block 4 for code hoisting: >> {plus_expr,_4,1} (0006) (3) Inserting expression in block 4 for code hoisting: {pointer_plus_expr,s_33,1} (0023) (4) Inserting expression in block 3 for code hoisting: {pointer_plus_expr,s_33,1} (0023) The issue seems to be hoisting of (*tab + 1) which consists of >> first two hoistings in block 4 from blocks 5 and 9, which causes the extra spill. I verified >> that by disabling hoisting into block 4, which resulted in no extra spills. I wonder if that's because the expression (*tab + 1) is getting hoisted from blocks 5 and 9, which are on loop exit ? So the expression that was previously computed in a block on loop exit, gets hoisted outside that >> block which possibly makes the allocator more defensive ? Similarly disabling hoisting of expressions which appeared in blocks on >> loop exit in original test-case prevented the extra spill. The other hoistings didn't seem to matter. >>> >>> I think that's simply co-incidence. The only thing that makes >>> a block that also exits from the loop special is that an >>> expression could be sunk out of the loop and hoisting (commoning >>> with another path) could prevent that. But that isn't what is >>> happening here and it would be a pass ordering issue as >>> the sinking pass runs only after hoisting (no idea why exactly >>> but I guess there are cases where we want to prefer CSE over >>> sinking). So you could try if re-ordering PRE and sinking helps >>> your testcase. >> Thanks for the suggestions. Placing sink pass before PRE works >> for both these test-cases! Sadly it still causes the spill for the >> benchmark -:( >> I will try to create a better approximation of the original >> test-case. >>> >>> What I do see is a missed opportunity to merge the successors >>> of BB 4. After PRE we have >>> >>> [local count: 159303558]: >>> : >>> pretmp_123 = *tab_37(D); >>> _87 = pretmp_123 + 1; >>> if (c_36 == 65) >>> goto ; [34.00%] >>> else >>> goto ; [66.00%] >>> >>> [local count: 54163210]: >>> *tab_37(D) = _87; >>> _96 = MEM[(char *)s_57 + 1B]; >>> if (_96 != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>> [local count: 105140348]: >>> *tab_37(D) = _87; >>> _56 = MEM[(char *)s_57 + 1B]; >>> if (_56 != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>> here at least the stores and loads can be hoisted. Note this >>> may also point at the real issue of the code hoisting which is >>> tearing apart the RMW operation? >> Indeed, this possibility seems much more likely than block being >> on loop exit. >> I will try to "hardcode" the load/store hoists into block 4 for >> this >> specific test-case to check >> if that prevents the spill. > Even if it prevents the spill in this case, it's likely a good >> thing to > do. The statements prior to the conditional in bb5 and bb8 should >> be > hoisted, leaving bb5 and bb8
[PATCH] tighten up -Wclass-memaccess for ctors/dtors (PR 84851)
A fix for 84851 - missing -Wclass-memaccess for a memcpy in a copy ctor with a non-trivial member was implemented but disabled for GCC 8 but because it was late, with the expectation we would enable it for GCC 9. The attached removes the code that guards the full fix to enable it. Martin PR c++/84851 - missing -Wclass-memaccess for a memcpy in a copy ctor with a non-trivial member gcc/cp/ChangeLog: PR c++/84851 * call.c (maybe_warn_class_memaccess): Tighten up. gcc/testsuite/ChangeLog: PR c++/84851 * g++.dg/Wclass-memaccess-4.C: Remove XFAIL. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 7aadd64..6a8ff6b 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -8535,15 +8535,6 @@ maybe_warn_class_memaccess (location_t loc, tree fndecl, bool special = same_type_ignoring_top_level_qualifiers_p (ctx, desttype); tree binfo = TYPE_BINFO (ctx); - /* FIXME: The following if statement is overly permissive (see - bug 84851). Remove it in GCC 9. */ - if (special - && !BINFO_VTABLE (binfo) - && !BINFO_N_BASE_BINFOS (binfo) - && (DECL_CONSTRUCTOR_P (current_function_decl) - || DECL_DESTRUCTOR_P (current_function_decl))) - return; - if (special && !BINFO_VTABLE (binfo) && !first_non_trivial_field (desttype)) diff --git a/gcc/testsuite/g++.dg/Wclass-memaccess-4.C b/gcc/testsuite/g++.dg/Wclass-memaccess-4.C index 8c33421..69b8c78 100644 --- a/gcc/testsuite/g++.dg/Wclass-memaccess-4.C +++ b/gcc/testsuite/g++.dg/Wclass-memaccess-4.C @@ -29,7 +29,7 @@ struct C C::C (const C &c) { - memcpy (this, &c, sizeof c);// { dg-warning "\\\[-Wclass-memaccess]" "pr84851" { xfail *-*-*} } + memcpy (this, &c, sizeof c);// { dg-warning "\\\[-Wclass-memaccess]" } } C& C::operator= (const C &c)
Re: virtual-stack-vars reference not resolved in vregs
On Fri, May 25, 2018 at 08:11:43AM +0200, Eric Botcazou wrote: > > Is this something the back end is responsible for getting right, for example > > via the machine description file? If so, any hints where to start? > > The SUBREG of MEM is invalid at this stage. >From rtl.texi: --- There are currently three supported types for the first operand of a @code{subreg}: @itemize @item pseudo registers This is the most common case. Most @code{subreg}s have pseudo @code{reg}s as their first operand. @item mem @code{subreg}s of @code{mem} were common in earlier versions of GCC and are still supported. During the reload pass these are replaced by plain @code{mem}s. On machines that do not do instruction scheduling, use of @code{subreg}s of @code{mem} are still used, but this is no longer recommended. Such @code{subreg}s are considered to be @code{register_operand}s rather than @code{memory_operand}s before and during reload. Because of this, the scheduling passes cannot properly schedule instructions with @code{subreg}s of @code{mem}, so for machines that do scheduling, @code{subreg}s of @code{mem} should never be used. To support this, the combine and recog passes have explicit code to inhibit the creation of @code{subreg}s of @code{mem} when @code{INSN_SCHEDULING} is defined. --- It would be very nice if we got rid of subreg-of-mem completely once and for all. The code following the comment /* In the general case, we expect virtual registers to appear only in operands, and then only as either bare registers or inside memories. */ in function.c:instantiate_virtual_regs_in_insn does not handle the subreg in this example instruction. Segher
Why is REG_ALLOC_ORDER not defined on Aarch64
I was curious if there was any reason that REG_ALLOC_ORDER is not defined for Aarch64. Has anyone tried this to see if it could help performance? It is defined for many other platforms. Steve Ellcey sell...@cavium.com
Re: Why is REG_ALLOC_ORDER not defined on Aarch64
On Fri, May 25, 2018 at 3:35 PM, Steve Ellcey wrote: > I was curious if there was any reason that REG_ALLOC_ORDER is not > defined for Aarch64. Has anyone tried this to see if it could help > performance? It is defined for many other platforms. https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01815.html https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01822.html > > Steve Ellcey > sell...@cavium.com
gcc-8-20180525 is now available
Snapshot gcc-8-20180525 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/8-20180525/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-8-branch revision 260785 You'll find: gcc-8-20180525.tar.xzComplete GCC SHA256=96f117eaacacd8b31f527fcb5133bbf6b9efb4773c040e324664361a75ed6ebb SHA1=ebfd27eeffadb79da92386bc57fb9496996a18e0 Diffs from 8-20180518 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: PR80155: Code hoisting and register pressure
On May 25, 2018 9:25:51 PM GMT+02:00, Jeff Law wrote: >On 05/25/2018 11:54 AM, Richard Biener wrote: >> On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law >wrote: >>> On 05/25/2018 03:49 AM, Bin.Cheng wrote: On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni wrote: > On 23 May 2018 at 18:37, Jeff Law wrote: >> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>> On 23 May 2018 at 13:58, Richard Biener >wrote: On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > Hi, > I am trying to work on PR80155, which exposes a problem with >>> code > hoisting and register pressure on a leading embedded benchmark >>> for ARM > cortex-m7, where code-hoisting causes an extra register spill. > > I have attached two test-cases which (hopefully) are >>> representative of > the original test-case. > The first one (trans_dfa.c) is bigger and somewhat similar to >>> the > original test-case and trans_dfa_2.c is hand-reduced version >of > trans_dfa.c. There's 2 spills caused with trans_dfa.c > and one spill with trans_dfa_2.c due to lesser amount of >cases. > The test-cases in the PR are probably not relevant. > > Initially I thought the spill was happening because of "too >many > hoistings" taking place in original test-case thus increasing >>> the > register pressure, but it seems the spill is possibly caused >>> because > expression gets hoisted out of a block that is on loop exit. > > For example, the following hoistings take place with >>> trans_dfa_2.c: > > (1) Inserting expression in block 4 for code hoisting: > {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) > > (2) Inserting expression in block 4 for code hoisting: >>> {plus_expr,_4,1} (0006) > > (3) Inserting expression in block 4 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > (4) Inserting expression in block 3 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > The issue seems to be hoisting of (*tab + 1) which consists of >>> first > two hoistings in block 4 > from blocks 5 and 9, which causes the extra spill. I verified >>> that by > disabling hoisting into block 4, > which resulted in no extra spills. > > I wonder if that's because the expression (*tab + 1) is >getting > hoisted from blocks 5 and 9, > which are on loop exit ? So the expression that was previously > computed in a block on loop exit, gets hoisted outside that >>> block > which possibly makes the allocator more defensive ? Similarly > disabling hoisting of expressions which appeared in blocks on >>> loop > exit in original test-case prevented the extra spill. The >other > hoistings didn't seem to matter. I think that's simply co-incidence. The only thing that makes a block that also exits from the loop special is that an expression could be sunk out of the loop and hoisting >(commoning with another path) could prevent that. But that isn't what is happening here and it would be a pass ordering issue as the sinking pass runs only after hoisting (no idea why exactly but I guess there are cases where we want to prefer CSE over sinking). So you could try if re-ordering PRE and sinking >helps your testcase. >>> Thanks for the suggestions. Placing sink pass before PRE works >>> for both these test-cases! Sadly it still causes the spill for >the >>> benchmark -:( >>> I will try to create a better approximation of the original >>> test-case. What I do see is a missed opportunity to merge the successors of BB 4. After PRE we have [local count: 159303558]: : pretmp_123 = *tab_37(D); _87 = pretmp_123 + 1; if (c_36 == 65) goto ; [34.00%] else goto ; [66.00%] [local count: 54163210]: *tab_37(D) = _87; _96 = MEM[(char *)s_57 + 1B]; if (_96 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105140348]: *tab_37(D) = _87; _56 = MEM[(char *)s_57 + 1B]; if (_56 != 0) goto ; [89.00%] else goto ; [11.00%] here at least the stores and loads can be hoisted. Note this may also point at the real issue of the code hoisting which is tearing apart the RMW operation? >>> Indeed, this possibility seems much more likely than block being >>> on loop exit. >>> I will try to "hardcode" the load/store hoists into block 4 for >>> this >>> specific test-case to check >>> if that prevents the spill. >> Eve