date:20180525

Re: PR80155: Code hoisting and register pressure

2018-05-25 Thread Prathamesh Kulkarni

On 23 May 2018 at 18:37, Jeff Law  wrote:
> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
>> On 23 May 2018 at 13:58, Richard Biener  wrote:
>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote:
>>>
 Hi,
 I am trying to work on PR80155, which exposes a problem with code
 hoisting and register pressure on a leading embedded benchmark for ARM
 cortex-m7, where code-hoisting causes an extra register spill.

 I have attached two test-cases which (hopefully) are representative of
 the original test-case.
 The first one (trans_dfa.c) is bigger and somewhat similar to the
 original test-case and trans_dfa_2.c is hand-reduced version of
 trans_dfa.c. There's 2 spills caused with trans_dfa.c
 and one spill with trans_dfa_2.c due to lesser amount of cases.
 The test-cases in the PR are probably not relevant.

 Initially I thought the spill was happening because of "too many
 hoistings" taking place in original test-case thus increasing the
 register pressure, but it seems the spill is possibly caused because
 expression gets hoisted out of a block that is on loop exit.

 For example, the following hoistings take place with trans_dfa_2.c:

 (1) Inserting expression in block 4 for code hoisting:
 {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)

 (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} 
 (0006)

 (3) Inserting expression in block 4 for code hoisting:
 {pointer_plus_expr,s_33,1} (0023)

 (4) Inserting expression in block 3 for code hoisting:
 {pointer_plus_expr,s_33,1} (0023)

 The issue seems to be hoisting of (*tab + 1) which consists of first
 two hoistings in block 4
 from blocks 5 and 9, which causes the extra spill. I verified that by
 disabling hoisting into block 4,
 which resulted in no extra spills.

 I wonder if that's because the expression (*tab + 1) is getting
 hoisted from blocks 5 and 9,
 which are on loop exit ? So the expression that was previously
 computed in a block on loop exit, gets hoisted outside that block
 which possibly makes the allocator more defensive ? Similarly
 disabling hoisting of expressions which appeared in blocks on loop
 exit in original test-case prevented the extra spill. The other
 hoistings didn't seem to matter.
>>>
>>> I think that's simply co-incidence.  The only thing that makes
>>> a block that also exits from the loop special is that an
>>> expression could be sunk out of the loop and hoisting (commoning
>>> with another path) could prevent that.  But that isn't what is
>>> happening here and it would be a pass ordering issue as
>>> the sinking pass runs only after hoisting (no idea why exactly
>>> but I guess there are cases where we want to prefer CSE over
>>> sinking).  So you could try if re-ordering PRE and sinking helps
>>> your testcase.
>> Thanks for the suggestions. Placing sink pass before PRE works
>> for both these test-cases! Sadly it still causes the spill for the benchmark 
>> -:(
>> I will try to create a better approximation of the original test-case.
>>>
>>> What I do see is a missed opportunity to merge the successors
>>> of BB 4.  After PRE we have
>>>
>>>  [local count: 159303558]:
>>> :
>>> pretmp_123 = *tab_37(D);
>>> _87 = pretmp_123 + 1;
>>> if (c_36 == 65)
>>>   goto ; [34.00%]
>>> else
>>>   goto ; [66.00%]
>>>
>>>  [local count: 54163210]:
>>> *tab_37(D) = _87;
>>> _96 = MEM[(char *)s_57 + 1B];
>>> if (_96 != 0)
>>>   goto ; [89.00%]
>>> else
>>>   goto ; [11.00%]
>>>
>>>  [local count: 105140348]:
>>> *tab_37(D) = _87;
>>> _56 = MEM[(char *)s_57 + 1B];
>>> if (_56 != 0)
>>>   goto ; [89.00%]
>>> else
>>>   goto ; [11.00%]
>>>
>>> here at least the stores and loads can be hoisted.  Note this
>>> may also point at the real issue of the code hoisting which is
>>> tearing apart the RMW operation?
>> Indeed, this possibility seems much more likely than block being on loop 
>> exit.
>> I will try to "hardcode" the load/store hoists into block 4 for this
>> specific test-case to check
>> if that prevents the spill.
> Even if it prevents the spill in this case, it's likely a good thing to
> do.  The statements prior to the conditional in bb5 and bb8 should be
> hoisted, leaving bb5 and bb8 with just their conditionals.
Hi,
It seems disabling forwprop somehow works for causing no extra spills
on the original test-case.

For instance,
Hoisting without forwprop:

bb 3:
_1 = tab_1(D) + 8
pretmp_268 = MEM[tab_1(D) + 8B];
_2 = pretmp_268 + 1;
goto  or 

bb 4:
 *_1 = _ 2

bb 5:
*_1 = _2

Hoisting with forwprop:

bb 3:
pretmp_164 = MEM[tab_1(D) + 8B];
_2 = pretmp_164 + 1
goto  or 

bb 4:
MEM[tab_1(D) + 8] = _2;

bb 5:
MEM[tab_1(D) + 8] = _2;

Although in both cases, we aren't hoisting stores, the issues with forwprop
for this case seems to be the folding of
*_1 = _2
into
MEM[tab_1(D) + 8] = _2  ?

Disabling folding to mem_ref[base + offset] in forwpro

Re: PR80155: Code hoisting and register pressure

2018-05-25 Thread Bin.Cheng

On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
 wrote:
> On 23 May 2018 at 18:37, Jeff Law  wrote:
>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
>>> On 23 May 2018 at 13:58, Richard Biener  wrote:
 On Wed, 23 May 2018, Prathamesh Kulkarni wrote:

> Hi,
> I am trying to work on PR80155, which exposes a problem with code
> hoisting and register pressure on a leading embedded benchmark for ARM
> cortex-m7, where code-hoisting causes an extra register spill.
>
> I have attached two test-cases which (hopefully) are representative of
> the original test-case.
> The first one (trans_dfa.c) is bigger and somewhat similar to the
> original test-case and trans_dfa_2.c is hand-reduced version of
> trans_dfa.c. There's 2 spills caused with trans_dfa.c
> and one spill with trans_dfa_2.c due to lesser amount of cases.
> The test-cases in the PR are probably not relevant.
>
> Initially I thought the spill was happening because of "too many
> hoistings" taking place in original test-case thus increasing the
> register pressure, but it seems the spill is possibly caused because
> expression gets hoisted out of a block that is on loop exit.
>
> For example, the following hoistings take place with trans_dfa_2.c:
>
> (1) Inserting expression in block 4 for code hoisting:
> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)
>
> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} 
> (0006)
>
> (3) Inserting expression in block 4 for code hoisting:
> {pointer_plus_expr,s_33,1} (0023)
>
> (4) Inserting expression in block 3 for code hoisting:
> {pointer_plus_expr,s_33,1} (0023)
>
> The issue seems to be hoisting of (*tab + 1) which consists of first
> two hoistings in block 4
> from blocks 5 and 9, which causes the extra spill. I verified that by
> disabling hoisting into block 4,
> which resulted in no extra spills.
>
> I wonder if that's because the expression (*tab + 1) is getting
> hoisted from blocks 5 and 9,
> which are on loop exit ? So the expression that was previously
> computed in a block on loop exit, gets hoisted outside that block
> which possibly makes the allocator more defensive ? Similarly
> disabling hoisting of expressions which appeared in blocks on loop
> exit in original test-case prevented the extra spill. The other
> hoistings didn't seem to matter.

 I think that's simply co-incidence.  The only thing that makes
 a block that also exits from the loop special is that an
 expression could be sunk out of the loop and hoisting (commoning
 with another path) could prevent that.  But that isn't what is
 happening here and it would be a pass ordering issue as
 the sinking pass runs only after hoisting (no idea why exactly
 but I guess there are cases where we want to prefer CSE over
 sinking).  So you could try if re-ordering PRE and sinking helps
 your testcase.
>>> Thanks for the suggestions. Placing sink pass before PRE works
>>> for both these test-cases! Sadly it still causes the spill for the 
>>> benchmark -:(
>>> I will try to create a better approximation of the original test-case.

 What I do see is a missed opportunity to merge the successors
 of BB 4.  After PRE we have

  [local count: 159303558]:
 :
 pretmp_123 = *tab_37(D);
 _87 = pretmp_123 + 1;
 if (c_36 == 65)
   goto ; [34.00%]
 else
   goto ; [66.00%]

  [local count: 54163210]:
 *tab_37(D) = _87;
 _96 = MEM[(char *)s_57 + 1B];
 if (_96 != 0)
   goto ; [89.00%]
 else
   goto ; [11.00%]

  [local count: 105140348]:
 *tab_37(D) = _87;
 _56 = MEM[(char *)s_57 + 1B];
 if (_56 != 0)
   goto ; [89.00%]
 else
   goto ; [11.00%]

 here at least the stores and loads can be hoisted.  Note this
 may also point at the real issue of the code hoisting which is
 tearing apart the RMW operation?
>>> Indeed, this possibility seems much more likely than block being on loop 
>>> exit.
>>> I will try to "hardcode" the load/store hoists into block 4 for this
>>> specific test-case to check
>>> if that prevents the spill.
>> Even if it prevents the spill in this case, it's likely a good thing to
>> do.  The statements prior to the conditional in bb5 and bb8 should be
>> hoisted, leaving bb5 and bb8 with just their conditionals.
> Hi,
> It seems disabling forwprop somehow works for causing no extra spills
> on the original test-case.
>
> For instance,
> Hoisting without forwprop:
>
> bb 3:
> _1 = tab_1(D) + 8
> pretmp_268 = MEM[tab_1(D) + 8B];
> _2 = pretmp_268 + 1;
> goto  or 
>
> bb 4:
>  *_1 = _ 2
>
> bb 5:
> *_1 = _2
>
> Hoisting with forwprop:
>
> bb 3:
> pretmp_164 = MEM[tab_1(D) + 8B];
> _2 = pretmp_164 + 1
> goto  or 
>
> bb 4:
> MEM[tab_1(D) + 8] = _2;
>
> bb 5:
> MEM[tab_1(D) +

Re: PR80155: Code hoisting and register pressure

2018-05-25 Thread Richard Biener

On Fri, 25 May 2018, Bin.Cheng wrote:

> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
>  wrote:
> > On 23 May 2018 at 18:37, Jeff Law  wrote:
> >> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
> >>> On 23 May 2018 at 13:58, Richard Biener  wrote:
>  On Wed, 23 May 2018, Prathamesh Kulkarni wrote:
> 
> > Hi,
> > I am trying to work on PR80155, which exposes a problem with code
> > hoisting and register pressure on a leading embedded benchmark for ARM
> > cortex-m7, where code-hoisting causes an extra register spill.
> >
> > I have attached two test-cases which (hopefully) are representative of
> > the original test-case.
> > The first one (trans_dfa.c) is bigger and somewhat similar to the
> > original test-case and trans_dfa_2.c is hand-reduced version of
> > trans_dfa.c. There's 2 spills caused with trans_dfa.c
> > and one spill with trans_dfa_2.c due to lesser amount of cases.
> > The test-cases in the PR are probably not relevant.
> >
> > Initially I thought the spill was happening because of "too many
> > hoistings" taking place in original test-case thus increasing the
> > register pressure, but it seems the spill is possibly caused because
> > expression gets hoisted out of a block that is on loop exit.
> >
> > For example, the following hoistings take place with trans_dfa_2.c:
> >
> > (1) Inserting expression in block 4 for code hoisting:
> > {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)
> >
> > (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} 
> > (0006)
> >
> > (3) Inserting expression in block 4 for code hoisting:
> > {pointer_plus_expr,s_33,1} (0023)
> >
> > (4) Inserting expression in block 3 for code hoisting:
> > {pointer_plus_expr,s_33,1} (0023)
> >
> > The issue seems to be hoisting of (*tab + 1) which consists of first
> > two hoistings in block 4
> > from blocks 5 and 9, which causes the extra spill. I verified that by
> > disabling hoisting into block 4,
> > which resulted in no extra spills.
> >
> > I wonder if that's because the expression (*tab + 1) is getting
> > hoisted from blocks 5 and 9,
> > which are on loop exit ? So the expression that was previously
> > computed in a block on loop exit, gets hoisted outside that block
> > which possibly makes the allocator more defensive ? Similarly
> > disabling hoisting of expressions which appeared in blocks on loop
> > exit in original test-case prevented the extra spill. The other
> > hoistings didn't seem to matter.
> 
>  I think that's simply co-incidence.  The only thing that makes
>  a block that also exits from the loop special is that an
>  expression could be sunk out of the loop and hoisting (commoning
>  with another path) could prevent that.  But that isn't what is
>  happening here and it would be a pass ordering issue as
>  the sinking pass runs only after hoisting (no idea why exactly
>  but I guess there are cases where we want to prefer CSE over
>  sinking).  So you could try if re-ordering PRE and sinking helps
>  your testcase.
> >>> Thanks for the suggestions. Placing sink pass before PRE works
> >>> for both these test-cases! Sadly it still causes the spill for the 
> >>> benchmark -:(
> >>> I will try to create a better approximation of the original test-case.
> 
>  What I do see is a missed opportunity to merge the successors
>  of BB 4.  After PRE we have
> 
>   [local count: 159303558]:
>  :
>  pretmp_123 = *tab_37(D);
>  _87 = pretmp_123 + 1;
>  if (c_36 == 65)
>    goto ; [34.00%]
>  else
>    goto ; [66.00%]
> 
>   [local count: 54163210]:
>  *tab_37(D) = _87;
>  _96 = MEM[(char *)s_57 + 1B];
>  if (_96 != 0)
>    goto ; [89.00%]
>  else
>    goto ; [11.00%]
> 
>   [local count: 105140348]:
>  *tab_37(D) = _87;
>  _56 = MEM[(char *)s_57 + 1B];
>  if (_56 != 0)
>    goto ; [89.00%]
>  else
>    goto ; [11.00%]
> 
>  here at least the stores and loads can be hoisted.  Note this
>  may also point at the real issue of the code hoisting which is
>  tearing apart the RMW operation?
> >>> Indeed, this possibility seems much more likely than block being on loop 
> >>> exit.
> >>> I will try to "hardcode" the load/store hoists into block 4 for this
> >>> specific test-case to check
> >>> if that prevents the spill.
> >> Even if it prevents the spill in this case, it's likely a good thing to
> >> do.  The statements prior to the conditional in bb5 and bb8 should be
> >> hoisted, leaving bb5 and bb8 with just their conditionals.
> > Hi,
> > It seems disabling forwprop somehow works for causing no extra spills
> > on the original test-case.
> >
> > For instance,
> > Hoisting without forwprop:
> >
> > bb 3:
> > _1 = tab_1(D) + 8
> > pretmp_26

Re: PR80155: Code hoisting and register pressure

2018-05-25 Thread Jeff Law

On 05/25/2018 03:49 AM, Bin.Cheng wrote:
> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
>  wrote:
>> On 23 May 2018 at 18:37, Jeff Law  wrote:
>>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
 On 23 May 2018 at 13:58, Richard Biener  wrote:
> On Wed, 23 May 2018, Prathamesh Kulkarni wrote:
>
>> Hi,
>> I am trying to work on PR80155, which exposes a problem with code
>> hoisting and register pressure on a leading embedded benchmark for ARM
>> cortex-m7, where code-hoisting causes an extra register spill.
>>
>> I have attached two test-cases which (hopefully) are representative of
>> the original test-case.
>> The first one (trans_dfa.c) is bigger and somewhat similar to the
>> original test-case and trans_dfa_2.c is hand-reduced version of
>> trans_dfa.c. There's 2 spills caused with trans_dfa.c
>> and one spill with trans_dfa_2.c due to lesser amount of cases.
>> The test-cases in the PR are probably not relevant.
>>
>> Initially I thought the spill was happening because of "too many
>> hoistings" taking place in original test-case thus increasing the
>> register pressure, but it seems the spill is possibly caused because
>> expression gets hoisted out of a block that is on loop exit.
>>
>> For example, the following hoistings take place with trans_dfa_2.c:
>>
>> (1) Inserting expression in block 4 for code hoisting:
>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)
>>
>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} 
>> (0006)
>>
>> (3) Inserting expression in block 4 for code hoisting:
>> {pointer_plus_expr,s_33,1} (0023)
>>
>> (4) Inserting expression in block 3 for code hoisting:
>> {pointer_plus_expr,s_33,1} (0023)
>>
>> The issue seems to be hoisting of (*tab + 1) which consists of first
>> two hoistings in block 4
>> from blocks 5 and 9, which causes the extra spill. I verified that by
>> disabling hoisting into block 4,
>> which resulted in no extra spills.
>>
>> I wonder if that's because the expression (*tab + 1) is getting
>> hoisted from blocks 5 and 9,
>> which are on loop exit ? So the expression that was previously
>> computed in a block on loop exit, gets hoisted outside that block
>> which possibly makes the allocator more defensive ? Similarly
>> disabling hoisting of expressions which appeared in blocks on loop
>> exit in original test-case prevented the extra spill. The other
>> hoistings didn't seem to matter.
>
> I think that's simply co-incidence.  The only thing that makes
> a block that also exits from the loop special is that an
> expression could be sunk out of the loop and hoisting (commoning
> with another path) could prevent that.  But that isn't what is
> happening here and it would be a pass ordering issue as
> the sinking pass runs only after hoisting (no idea why exactly
> but I guess there are cases where we want to prefer CSE over
> sinking).  So you could try if re-ordering PRE and sinking helps
> your testcase.
 Thanks for the suggestions. Placing sink pass before PRE works
 for both these test-cases! Sadly it still causes the spill for the 
 benchmark -:(
 I will try to create a better approximation of the original test-case.
>
> What I do see is a missed opportunity to merge the successors
> of BB 4.  After PRE we have
>
>  [local count: 159303558]:
> :
> pretmp_123 = *tab_37(D);
> _87 = pretmp_123 + 1;
> if (c_36 == 65)
>   goto ; [34.00%]
> else
>   goto ; [66.00%]
>
>  [local count: 54163210]:
> *tab_37(D) = _87;
> _96 = MEM[(char *)s_57 + 1B];
> if (_96 != 0)
>   goto ; [89.00%]
> else
>   goto ; [11.00%]
>
>  [local count: 105140348]:
> *tab_37(D) = _87;
> _56 = MEM[(char *)s_57 + 1B];
> if (_56 != 0)
>   goto ; [89.00%]
> else
>   goto ; [11.00%]
>
> here at least the stores and loads can be hoisted.  Note this
> may also point at the real issue of the code hoisting which is
> tearing apart the RMW operation?
 Indeed, this possibility seems much more likely than block being on loop 
 exit.
 I will try to "hardcode" the load/store hoists into block 4 for this
 specific test-case to check
 if that prevents the spill.
>>> Even if it prevents the spill in this case, it's likely a good thing to
>>> do.  The statements prior to the conditional in bb5 and bb8 should be
>>> hoisted, leaving bb5 and bb8 with just their conditionals.
>> Hi,
>> It seems disabling forwprop somehow works for causing no extra spills
>> on the original test-case.
>>
>> For instance,
>> Hoisting without forwprop:
>>
>> bb 3:
>> _1 = tab_1(D) + 8
>> pretmp_268 = MEM[tab_1(D) + 8B];
>> _2 = pretmp_268 + 1;
>> goto  or 
>>
>> bb 4:
>>  *_1 = _ 2
>>
>> bb 5:
>> *_1 = _2
>>

Re: PR80155: Code hoisting and register pressure

2018-05-25 Thread Richard Biener

On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law  wrote:
>On 05/25/2018 03:49 AM, Bin.Cheng wrote:
>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
>>  wrote:
>>> On 23 May 2018 at 18:37, Jeff Law  wrote:
 On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
> On 23 May 2018 at 13:58, Richard Biener  wrote:
>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote:
>>
>>> Hi,
>>> I am trying to work on PR80155, which exposes a problem with
>code
>>> hoisting and register pressure on a leading embedded benchmark
>for ARM
>>> cortex-m7, where code-hoisting causes an extra register spill.
>>>
>>> I have attached two test-cases which (hopefully) are
>representative of
>>> the original test-case.
>>> The first one (trans_dfa.c) is bigger and somewhat similar to
>the
>>> original test-case and trans_dfa_2.c is hand-reduced version of
>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c
>>> and one spill with trans_dfa_2.c due to lesser amount of cases.
>>> The test-cases in the PR are probably not relevant.
>>>
>>> Initially I thought the spill was happening because of "too many
>>> hoistings" taking place in original test-case thus increasing
>the
>>> register pressure, but it seems the spill is possibly caused
>because
>>> expression gets hoisted out of a block that is on loop exit.
>>>
>>> For example, the following hoistings take place with
>trans_dfa_2.c:
>>>
>>> (1) Inserting expression in block 4 for code hoisting:
>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)
>>>
>>> (2) Inserting expression in block 4 for code hoisting:
>{plus_expr,_4,1} (0006)
>>>
>>> (3) Inserting expression in block 4 for code hoisting:
>>> {pointer_plus_expr,s_33,1} (0023)
>>>
>>> (4) Inserting expression in block 3 for code hoisting:
>>> {pointer_plus_expr,s_33,1} (0023)
>>>
>>> The issue seems to be hoisting of (*tab + 1) which consists of
>first
>>> two hoistings in block 4
>>> from blocks 5 and 9, which causes the extra spill. I verified
>that by
>>> disabling hoisting into block 4,
>>> which resulted in no extra spills.
>>>
>>> I wonder if that's because the expression (*tab + 1) is getting
>>> hoisted from blocks 5 and 9,
>>> which are on loop exit ? So the expression that was previously
>>> computed in a block on loop exit, gets hoisted outside that
>block
>>> which possibly makes the allocator more defensive ? Similarly
>>> disabling hoisting of expressions which appeared in blocks on
>loop
>>> exit in original test-case prevented the extra spill. The other
>>> hoistings didn't seem to matter.
>>
>> I think that's simply co-incidence.  The only thing that makes
>> a block that also exits from the loop special is that an
>> expression could be sunk out of the loop and hoisting (commoning
>> with another path) could prevent that.  But that isn't what is
>> happening here and it would be a pass ordering issue as
>> the sinking pass runs only after hoisting (no idea why exactly
>> but I guess there are cases where we want to prefer CSE over
>> sinking).  So you could try if re-ordering PRE and sinking helps
>> your testcase.
> Thanks for the suggestions. Placing sink pass before PRE works
> for both these test-cases! Sadly it still causes the spill for the
>benchmark -:(
> I will try to create a better approximation of the original
>test-case.
>>
>> What I do see is a missed opportunity to merge the successors
>> of BB 4.  After PRE we have
>>
>>  [local count: 159303558]:
>> :
>> pretmp_123 = *tab_37(D);
>> _87 = pretmp_123 + 1;
>> if (c_36 == 65)
>>   goto ; [34.00%]
>> else
>>   goto ; [66.00%]
>>
>>  [local count: 54163210]:
>> *tab_37(D) = _87;
>> _96 = MEM[(char *)s_57 + 1B];
>> if (_96 != 0)
>>   goto ; [89.00%]
>> else
>>   goto ; [11.00%]
>>
>>  [local count: 105140348]:
>> *tab_37(D) = _87;
>> _56 = MEM[(char *)s_57 + 1B];
>> if (_56 != 0)
>>   goto ; [89.00%]
>> else
>>   goto ; [11.00%]
>>
>> here at least the stores and loads can be hoisted.  Note this
>> may also point at the real issue of the code hoisting which is
>> tearing apart the RMW operation?
> Indeed, this possibility seems much more likely than block being
>on loop exit.
> I will try to "hardcode" the load/store hoists into block 4 for
>this
> specific test-case to check
> if that prevents the spill.
 Even if it prevents the spill in this case, it's likely a good
>thing to
 do.  The statements prior to the conditional in bb5 and bb8 should
>be
 hoisted, leaving bb5 and bb8 with just their conditionals.
>>> Hi,
>>> It seems disabling forwprop somehow works for causing no extra
>spills
>>> on the original test-case.
>>>
>>> For instance,
>>> Hoisting without for

not computable at load time

2018-05-25 Thread Paul Koning

One of my testsuite failures for the pdp11 back end is 
gcc.c-torture/compile/930326-1.c which is:

struct
{
  char a, b, f[3];
} s;

long i = s.f-&s.b;

It fails with "error: initializer element is not computable at load time".  
I don't understand why because it seems to be a perfectly reasonable 
compile time constant; "load time" doesn't enter into the picture that
I can see.

If I replace "long" by "short" it works correctly.  So presumably it has
something to do with the fact that Pmode == HImode.  But how that translates
into this failure I don't know.

paul

Re: PR80155: Code hoisting and register pressure

2018-05-25 Thread Jeff Law

On 05/25/2018 11:54 AM, Richard Biener wrote:
> On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law  wrote:
>> On 05/25/2018 03:49 AM, Bin.Cheng wrote:
>>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
>>>  wrote:
 On 23 May 2018 at 18:37, Jeff Law  wrote:
> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
>> On 23 May 2018 at 13:58, Richard Biener  wrote:
>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote:
>>>
 Hi,
 I am trying to work on PR80155, which exposes a problem with
>> code
 hoisting and register pressure on a leading embedded benchmark
>> for ARM
 cortex-m7, where code-hoisting causes an extra register spill.

 I have attached two test-cases which (hopefully) are
>> representative of
 the original test-case.
 The first one (trans_dfa.c) is bigger and somewhat similar to
>> the
 original test-case and trans_dfa_2.c is hand-reduced version of
 trans_dfa.c. There's 2 spills caused with trans_dfa.c
 and one spill with trans_dfa_2.c due to lesser amount of cases.
 The test-cases in the PR are probably not relevant.

 Initially I thought the spill was happening because of "too many
 hoistings" taking place in original test-case thus increasing
>> the
 register pressure, but it seems the spill is possibly caused
>> because
 expression gets hoisted out of a block that is on loop exit.

 For example, the following hoistings take place with
>> trans_dfa_2.c:

 (1) Inserting expression in block 4 for code hoisting:
 {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)

 (2) Inserting expression in block 4 for code hoisting:
>> {plus_expr,_4,1} (0006)

 (3) Inserting expression in block 4 for code hoisting:
 {pointer_plus_expr,s_33,1} (0023)

 (4) Inserting expression in block 3 for code hoisting:
 {pointer_plus_expr,s_33,1} (0023)

 The issue seems to be hoisting of (*tab + 1) which consists of
>> first
 two hoistings in block 4
 from blocks 5 and 9, which causes the extra spill. I verified
>> that by
 disabling hoisting into block 4,
 which resulted in no extra spills.

 I wonder if that's because the expression (*tab + 1) is getting
 hoisted from blocks 5 and 9,
 which are on loop exit ? So the expression that was previously
 computed in a block on loop exit, gets hoisted outside that
>> block
 which possibly makes the allocator more defensive ? Similarly
 disabling hoisting of expressions which appeared in blocks on
>> loop
 exit in original test-case prevented the extra spill. The other
 hoistings didn't seem to matter.
>>>
>>> I think that's simply co-incidence.  The only thing that makes
>>> a block that also exits from the loop special is that an
>>> expression could be sunk out of the loop and hoisting (commoning
>>> with another path) could prevent that.  But that isn't what is
>>> happening here and it would be a pass ordering issue as
>>> the sinking pass runs only after hoisting (no idea why exactly
>>> but I guess there are cases where we want to prefer CSE over
>>> sinking).  So you could try if re-ordering PRE and sinking helps
>>> your testcase.
>> Thanks for the suggestions. Placing sink pass before PRE works
>> for both these test-cases! Sadly it still causes the spill for the
>> benchmark -:(
>> I will try to create a better approximation of the original
>> test-case.
>>>
>>> What I do see is a missed opportunity to merge the successors
>>> of BB 4.  After PRE we have
>>>
>>>  [local count: 159303558]:
>>> :
>>> pretmp_123 = *tab_37(D);
>>> _87 = pretmp_123 + 1;
>>> if (c_36 == 65)
>>>   goto ; [34.00%]
>>> else
>>>   goto ; [66.00%]
>>>
>>>  [local count: 54163210]:
>>> *tab_37(D) = _87;
>>> _96 = MEM[(char *)s_57 + 1B];
>>> if (_96 != 0)
>>>   goto ; [89.00%]
>>> else
>>>   goto ; [11.00%]
>>>
>>>  [local count: 105140348]:
>>> *tab_37(D) = _87;
>>> _56 = MEM[(char *)s_57 + 1B];
>>> if (_56 != 0)
>>>   goto ; [89.00%]
>>> else
>>>   goto ; [11.00%]
>>>
>>> here at least the stores and loads can be hoisted.  Note this
>>> may also point at the real issue of the code hoisting which is
>>> tearing apart the RMW operation?
>> Indeed, this possibility seems much more likely than block being
>> on loop exit.
>> I will try to "hardcode" the load/store hoists into block 4 for
>> this
>> specific test-case to check
>> if that prevents the spill.
> Even if it prevents the spill in this case, it's likely a good
>> thing to
> do.  The statements prior to the conditional in bb5 and bb8 should
>> be
> hoisted, leaving bb5 and bb8

[PATCH] tighten up -Wclass-memaccess for ctors/dtors (PR 84851)

2018-05-25 Thread Martin Sebor


A fix for 84851 - missing -Wclass-memaccess for a memcpy in a copy
ctor with a non-trivial member was implemented but disabled for GCC
8 but because it was late, with the expectation we would enable it
for GCC 9.  The attached removes the code that guards the full fix
to enable it.

Martin

PR c++/84851 - missing -Wclass-memaccess for a memcpy in a copy ctor with a non-trivial member

gcc/cp/ChangeLog:

	PR c++/84851
	* call.c (maybe_warn_class_memaccess): Tighten up.

gcc/testsuite/ChangeLog:

	PR c++/84851
	* g++.dg/Wclass-memaccess-4.C: Remove XFAIL.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 7aadd64..6a8ff6b 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8535,15 +8535,6 @@ maybe_warn_class_memaccess (location_t loc, tree fndecl,
   bool special = same_type_ignoring_top_level_qualifiers_p (ctx, desttype);
   tree binfo = TYPE_BINFO (ctx);
 
-  /* FIXME: The following if statement is overly permissive (see
-	 bug 84851).  Remove it in GCC 9.  */
-  if (special
-	  && !BINFO_VTABLE (binfo)
-	  && !BINFO_N_BASE_BINFOS (binfo)
-	  && (DECL_CONSTRUCTOR_P (current_function_decl)
-	  || DECL_DESTRUCTOR_P (current_function_decl)))
-	return;
-
   if (special
 	  && !BINFO_VTABLE (binfo)
 	  && !first_non_trivial_field (desttype))
diff --git a/gcc/testsuite/g++.dg/Wclass-memaccess-4.C b/gcc/testsuite/g++.dg/Wclass-memaccess-4.C
index 8c33421..69b8c78 100644
--- a/gcc/testsuite/g++.dg/Wclass-memaccess-4.C
+++ b/gcc/testsuite/g++.dg/Wclass-memaccess-4.C
@@ -29,7 +29,7 @@ struct C
 
 C::C (const C &c)
 {
-  memcpy (this, &c, sizeof c);// { dg-warning "\\\[-Wclass-memaccess]" "pr84851" { xfail *-*-*} }
+  memcpy (this, &c, sizeof c);// { dg-warning "\\\[-Wclass-memaccess]" }
 }
 
 C& C::operator= (const C &c)

Re: virtual-stack-vars reference not resolved in vregs

2018-05-25 Thread Segher Boessenkool

On Fri, May 25, 2018 at 08:11:43AM +0200, Eric Botcazou wrote:
> > Is this something the back end is responsible for getting right, for example
> > via the machine description file?  If so, any hints where to start?
> 
> The SUBREG of MEM is invalid at this stage.

>From rtl.texi:

---
There are currently three supported types for the first operand of a
@code{subreg}:
@itemize
@item pseudo registers
This is the most common case.  Most @code{subreg}s have pseudo
@code{reg}s as their first operand.

@item mem
@code{subreg}s of @code{mem} were common in earlier versions of GCC and
are still supported.  During the reload pass these are replaced by plain
@code{mem}s.  On machines that do not do instruction scheduling, use of
@code{subreg}s of @code{mem} are still used, but this is no longer
recommended.  Such @code{subreg}s are considered to be
@code{register_operand}s rather than @code{memory_operand}s before and
during reload.  Because of this, the scheduling passes cannot properly
schedule instructions with @code{subreg}s of @code{mem}, so for machines
that do scheduling, @code{subreg}s of @code{mem} should never be used.
To support this, the combine and recog passes have explicit code to
inhibit the creation of @code{subreg}s of @code{mem} when
@code{INSN_SCHEDULING} is defined.
---

It would be very nice if we got rid of subreg-of-mem completely once and
for all.

The code following the comment
  /* In the general case, we expect virtual registers to appear only in
 operands, and then only as either bare registers or inside memories.  */
in function.c:instantiate_virtual_regs_in_insn does not handle the subreg
in this example instruction.


Segher

Why is REG_ALLOC_ORDER not defined on Aarch64

2018-05-25 Thread Steve Ellcey

I was curious if there was any reason that REG_ALLOC_ORDER is not
defined for Aarch64.  Has anyone tried this to see if it could help
performance?  It is defined for many other platforms.

Steve Ellcey
sell...@cavium.com

Re: Why is REG_ALLOC_ORDER not defined on Aarch64

2018-05-25 Thread Andrew Pinski

On Fri, May 25, 2018 at 3:35 PM, Steve Ellcey  wrote:
> I was curious if there was any reason that REG_ALLOC_ORDER is not
> defined for Aarch64.  Has anyone tried this to see if it could help
> performance?  It is defined for many other platforms.

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01815.html
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01822.html

>
> Steve Ellcey
> sell...@cavium.com

gcc-8-20180525 is now available

2018-05-25 Thread gccadmin

Snapshot gcc-8-20180525 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/8-20180525/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-8-branch 
revision 260785

You'll find:

 gcc-8-20180525.tar.xzComplete GCC

  SHA256=96f117eaacacd8b31f527fcb5133bbf6b9efb4773c040e324664361a75ed6ebb
  SHA1=ebfd27eeffadb79da92386bc57fb9496996a18e0

Diffs from 8-20180518 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: PR80155: Code hoisting and register pressure

2018-05-25 Thread Richard Biener

On May 25, 2018 9:25:51 PM GMT+02:00, Jeff Law  wrote:
>On 05/25/2018 11:54 AM, Richard Biener wrote:
>> On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law 
>wrote:
>>> On 05/25/2018 03:49 AM, Bin.Cheng wrote:
 On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
  wrote:
> On 23 May 2018 at 18:37, Jeff Law  wrote:
>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
>>> On 23 May 2018 at 13:58, Richard Biener 
>wrote:
 On Wed, 23 May 2018, Prathamesh Kulkarni wrote:

> Hi,
> I am trying to work on PR80155, which exposes a problem with
>>> code
> hoisting and register pressure on a leading embedded benchmark
>>> for ARM
> cortex-m7, where code-hoisting causes an extra register spill.
>
> I have attached two test-cases which (hopefully) are
>>> representative of
> the original test-case.
> The first one (trans_dfa.c) is bigger and somewhat similar to
>>> the
> original test-case and trans_dfa_2.c is hand-reduced version
>of
> trans_dfa.c. There's 2 spills caused with trans_dfa.c
> and one spill with trans_dfa_2.c due to lesser amount of
>cases.
> The test-cases in the PR are probably not relevant.
>
> Initially I thought the spill was happening because of "too
>many
> hoistings" taking place in original test-case thus increasing
>>> the
> register pressure, but it seems the spill is possibly caused
>>> because
> expression gets hoisted out of a block that is on loop exit.
>
> For example, the following hoistings take place with
>>> trans_dfa_2.c:
>
> (1) Inserting expression in block 4 for code hoisting:
> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)
>
> (2) Inserting expression in block 4 for code hoisting:
>>> {plus_expr,_4,1} (0006)
>
> (3) Inserting expression in block 4 for code hoisting:
> {pointer_plus_expr,s_33,1} (0023)
>
> (4) Inserting expression in block 3 for code hoisting:
> {pointer_plus_expr,s_33,1} (0023)
>
> The issue seems to be hoisting of (*tab + 1) which consists of
>>> first
> two hoistings in block 4
> from blocks 5 and 9, which causes the extra spill. I verified
>>> that by
> disabling hoisting into block 4,
> which resulted in no extra spills.
>
> I wonder if that's because the expression (*tab + 1) is
>getting
> hoisted from blocks 5 and 9,
> which are on loop exit ? So the expression that was previously
> computed in a block on loop exit, gets hoisted outside that
>>> block
> which possibly makes the allocator more defensive ? Similarly
> disabling hoisting of expressions which appeared in blocks on
>>> loop
> exit in original test-case prevented the extra spill. The
>other
> hoistings didn't seem to matter.

 I think that's simply co-incidence.  The only thing that makes
 a block that also exits from the loop special is that an
 expression could be sunk out of the loop and hoisting
>(commoning
 with another path) could prevent that.  But that isn't what is
 happening here and it would be a pass ordering issue as
 the sinking pass runs only after hoisting (no idea why exactly
 but I guess there are cases where we want to prefer CSE over
 sinking).  So you could try if re-ordering PRE and sinking
>helps
 your testcase.
>>> Thanks for the suggestions. Placing sink pass before PRE works
>>> for both these test-cases! Sadly it still causes the spill for
>the
>>> benchmark -:(
>>> I will try to create a better approximation of the original
>>> test-case.

 What I do see is a missed opportunity to merge the successors
 of BB 4.  After PRE we have

  [local count: 159303558]:
 :
 pretmp_123 = *tab_37(D);
 _87 = pretmp_123 + 1;
 if (c_36 == 65)
   goto ; [34.00%]
 else
   goto ; [66.00%]

  [local count: 54163210]:
 *tab_37(D) = _87;
 _96 = MEM[(char *)s_57 + 1B];
 if (_96 != 0)
   goto ; [89.00%]
 else
   goto ; [11.00%]

  [local count: 105140348]:
 *tab_37(D) = _87;
 _56 = MEM[(char *)s_57 + 1B];
 if (_56 != 0)
   goto ; [89.00%]
 else
   goto ; [11.00%]

 here at least the stores and loads can be hoisted.  Note this
 may also point at the real issue of the code hoisting which is
 tearing apart the RMW operation?
>>> Indeed, this possibility seems much more likely than block being
>>> on loop exit.
>>> I will try to "hardcode" the load/store hoists into block 4 for
>>> this
>>> specific test-case to check
>>> if that prevents the spill.
>> Eve

Re: PR80155: Code hoisting and register pressure

Re: PR80155: Code hoisting and register pressure

Re: PR80155: Code hoisting and register pressure

Re: PR80155: Code hoisting and register pressure

Re: PR80155: Code hoisting and register pressure

not computable at load time

Re: PR80155: Code hoisting and register pressure

[PATCH] tighten up -Wclass-memaccess for ctors/dtors (PR 84851)

Re: virtual-stack-vars reference not resolved in vregs

Why is REG_ALLOC_ORDER not defined on Aarch64

Re: Why is REG_ALLOC_ORDER not defined on Aarch64

gcc-8-20180525 is now available

Re: PR80155: Code hoisting and register pressure

13 matches

Site Navigation

Mail list logo

Footer information