[Google] Refine hot caller heuristic

2013-08-20 Thread Easwaran Raman
The current hot caller heuristic simply promotes edges whose caller is
hot. This patch does the following:
* Turn it off for applications with large footprint since the size
increase hurts them
* Be more selective by considering arguments to callee when the
heuristic is enabled.

This performs well on internal benchmarks. Ok for google/4_8 branch if
all tests pass?

- Easwaran


hot_caller.patch
Description: Binary data


Re: [Google] Refine hot caller heuristic

2013-08-29 Thread Easwaran Raman
On Tue, Aug 20, 2013 at 9:35 PM, Xinliang David Li  wrote:
> Do you need to guard the jump function access with check if
> (ipa_node_params_vector.exists ())?
I believe it is not necessary since, for example, ipa_analyze_node
calls ipa_check_create_node_params that calls create. But I see it is
used in ipa-inline-analysis.c everywhere. So I have added a check and
conservatively return false.

>
> Ideally, useful_cold_callee should be folded into the inline hints
> estimation.  Question about the heuristic: why filtering out
> PASS_THROUGH parameter cases completely? Passing 'this' parameter in
> many cases can result in good PRE opportunities.  Why not skip the
> unknown type?

The rationale is it is useful to inline bar into foo in the snippet below:

void foo ()
{
  A a;
  bar(&a);
  ...
}

Capturing this requires UNKNOWN and KNOWN_TYPE jump functions. I have
changed the check accordingly. I have attached the new patch.

- Easwaran

> David
>
> On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman  wrote:
>> The current hot caller heuristic simply promotes edges whose caller is
>> hot. This patch does the following:
>> * Turn it off for applications with large footprint since the size
>> increase hurts them
>> * Be more selective by considering arguments to callee when the
>> heuristic is enabled.
>>
>> This performs well on internal benchmarks. Ok for google/4_8 branch if
>> all tests pass?
>>
>> - Easwaran


hot_caller.patch
Description: Binary data


Re: [Google] Refine hot caller heuristic

2013-08-29 Thread Easwaran Raman
 enable_hot_caller_heuristic ();
>>if (cgraph_maybe_hot_edge_p (edge))
>>  return true;
>>
>> @@ -543,9 +591,17 @@ edge_hot_enough_p (struct cgraph_edge *edge)
>>if (flag_auto_profile && edge->callee->count == 0
>>&& edge->callee->max_bb_count > 0)
>>  return false;
>> -  if (PARAM_VALUE (PARAM_INLINE_HOT_CALLER)
>> -  && maybe_hot_count_p (NULL, edge->caller->max_bb_count))
>> -return true;
>> +  if (use_hot_caller_heuristic)
>> +{
>> +  struct cgraph_node *where = edge->caller;
>> +  if (maybe_hot_count_p (NULL, where->max_bb_count))
>> +{
>> +  if (PARAM_VALUE (PARAM_INLINE_USEFUL_COLD_CALLEE))
>> +return useful_cold_callee (edge);
>> +  else
>> +return true;
>> +}
>> +}
>>return false;
>>  }
>
> On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman  wrote:
>> The current hot caller heuristic simply promotes edges whose caller is
>> hot. This patch does the following:
>> * Turn it off for applications with large footprint since the size
>> increase hurts them
>> * Be more selective by considering arguments to callee when the
>> heuristic is enabled.
>>
>> This performs well on internal benchmarks. Ok for google/4_8 branch if
>> all tests pass?
>>
>> - Easwaran
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Fix PR middle-end/57370

2013-08-29 Thread Easwaran Raman
Richard,
Do you want me to commit everything but the  insert_stmt_after hunk
now? There are more similar failures reported in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
the updated patch there. Shall I send that for review? Apart from the
debug statement issue, almost all the bugs are due to dependence
violation because certain newly inserted statements do not have the
right UID. Instead of trying to catch all of them, will it be better
if I check if the stmt has a proper uid (non-zero if it is not the
first stmt) and assign a sensible value at the point where it is used
(find_insert_point and appears_later_in_bb) instead of where the stmt
is created? I think that would be less brittle.

- Easwaran



On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
 wrote:
> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
>> I have a new patch that supersedes this. The new patch also fixes PR
>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
>> no test regression on x86_64/linux. Ok for trunk?
>>
>> 2013-07-31  Easwaran Raman  
>>
>> PR middle-end/57370
>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
>> of debug statements that cause inconsistent IR.
>
> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
> at all.  The other hunks are ok, but we need to work harder to preserve
> debug stmts - simply removing all is not going to fly.
>
> Richard.
>
>>
>> testsuite/ChangeLog:
>> 2013-07-31  Easwaran Raman  
>>
>> PR middle-end/57370
>> PR tree-optimization/57393
>> PR tree-optimization/58011
>> * gfortran.dg/reassoc_12.f90: New testcase.
>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>
>>
>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
>>> Ping.
>>>
>>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  wrote:
>>>> A newly generated statement in build_and_add_sum  function of
>>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>>>> statement. In one instance, it was assigned the wrong uid (of an
>>>> earlier phi statement) which messed up the IR and caused the test
>>>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>>>> Ok for trunk?
>>>>
>>>> Thanks,
>>>> Easwaran
>>>>
>>>>
>>>> 2013-06-27  Easwaran Raman  
>>>>
>>>> PR middle-end/57370
>>>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a 
>>>> phi
>>>> node for a non-phi gimple statement.
>>>>
>>>> testsuite/ChangeLog:
>>>> 2013-06-27  Easwaran Raman  
>>>>
>>>> PR middle-end/57370
>>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>>>
>>>>
>>>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
>>>> ===
>>>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>>> @@ -0,0 +1,74 @@
>>>> +! { dg-do compile }
>>>> +! { dg-options "-O2 -ffast-math" }
>>>> +! PR middle-end/57370
>>>> +
>>>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
>>>> +   grad_deriv,npoints, sx)
>>>> +IMPLICIT REAL*8 (t)
>>>> +INTEGER, PARAMETER :: dp=8
>>>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
>>>> +   e_ndrho_ndrho_rho
>>>> +  DO ii=1,npoints
>>>> +  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
>>>> +t1425 = t233 * t557
>>>> +t1429 = beta * t225
>>>> +t1622 = t327 * t1621
>>>> +t1626 = t327 * t1625
>>>> +t1632 = t327 * t1631
>>>> +t1685 = t105 * t1684
>>>> +t2057 = t1636 + t8 * (t2635 + t3288)
>>>> +  END IF
>>>> +  IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN
>>>> +t5469 = t5440 - t5443 - t5446 - t5449 - &
>>>> +t5451 - t5454 - t5456 + t5459  - &
>>>> +t5462 + t5466 - t5468
>>>> +t5478 = 0.240e2_dp * t1616 *

Re: Fix PR middle-end/57370

2013-08-30 Thread Easwaran Raman
On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener
 wrote:
> On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman  wrote:
>> Richard,
>> Do you want me to commit everything but the  insert_stmt_after hunk
>> now?
>
> Yes.
>
>> There are more similar failures reported in
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
>> the updated patch there. Shall I send that for review? Apart from the
>> debug statement issue, almost all the bugs are due to dependence
>> violation because certain newly inserted statements do not have the
>> right UID. Instead of trying to catch all of them, will it be better
>> if I check if the stmt has a proper uid (non-zero if it is not the
>> first stmt) and assign a sensible value at the point where it is used
>> (find_insert_point and appears_later_in_bb) instead of where the stmt
>> is created? I think that would be less brittle.
>
> In the end all this placement stuff should be reverted and done as
> post-processing after reassoc is finished.  You can remember the
> inserted stmts that are candidates for moving (just set a pass-local
> flag on them) and assign UIDs for the whole function after all stmts
> are inserted.

The problem is we need sane UID values as reassoc is happening - not
after it is finished. As it process each stmt in reassoc_bb, it might
generate new stmts which might have to be compared in
find_insert_point or appears_later_in_bb.

- Easwaran

> Richard.
>
>
>> - Easwaran
>>
>>
>>
>> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
>>  wrote:
>>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
>>>> I have a new patch that supersedes this. The new patch also fixes PR
>>>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
>>>> no test regression on x86_64/linux. Ok for trunk?
>>>>
>>>> 2013-07-31  Easwaran Raman  
>>>>
>>>> PR middle-end/57370
>>>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
>>>> of debug statements that cause inconsistent IR.
>>>
>>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
>>> at all.  The other hunks are ok, but we need to work harder to preserve
>>> debug stmts - simply removing all is not going to fly.
>>>
>>> Richard.
>>>
>>>>
>>>> testsuite/ChangeLog:
>>>> 2013-07-31  Easwaran Raman  
>>>>
>>>> PR middle-end/57370
>>>> PR tree-optimization/57393
>>>> PR tree-optimization/58011
>>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>>>
>>>>
>>>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
>>>>> Ping.
>>>>>
>>>>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  
>>>>> wrote:
>>>>>> A newly generated statement in build_and_add_sum  function of
>>>>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>>>>>> statement. In one instance, it was assigned the wrong uid (of an
>>>>>> earlier phi statement) which messed up the IR and caused the test
>>>>>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>>>>>> Ok for trunk?
>>>>>>
>>>>>> Thanks,
>>>>>> Easwaran
>>>>>>
>>>>>>
>>>>>> 2013-06-27  Easwaran Raman  
>>>>>>
>>>>>> PR middle-end/57370
>>>>>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of 
>>>>>> a phi
>>>>>> node for a non-phi gimple statement.
>>>>>>
>>>>>> testsuite/ChangeLog:
>>>>>> 2013-06-27  Easwaran Raman  
>>>>>>
>>>>>> PR middle-end/57370
>>>>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>>>>>
>>>>>>
>>>>>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
>>>>>> ===
>>>>>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>>>>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>>>>> @@ -0,0 +1,74 @@
>>>>>> +! { dg-do comp

Re: Fix PR middle-end/57370

2013-08-30 Thread Easwaran Raman
On Fri, Aug 30, 2013 at 9:26 AM, Jakub Jelinek  wrote:
> On Fri, Aug 30, 2013 at 09:13:34AM -0700, Easwaran Raman wrote:
>> >> There are more similar failures reported in
>> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
>> >> the updated patch there. Shall I send that for review? Apart from the
>> >> debug statement issue, almost all the bugs are due to dependence
>> >> violation because certain newly inserted statements do not have the
>> >> right UID. Instead of trying to catch all of them, will it be better
>> >> if I check if the stmt has a proper uid (non-zero if it is not the
>> >> first stmt) and assign a sensible value at the point where it is used
>> >> (find_insert_point and appears_later_in_bb) instead of where the stmt
>> >> is created? I think that would be less brittle.
>> >
>> > In the end all this placement stuff should be reverted and done as
>> > post-processing after reassoc is finished.  You can remember the
>> > inserted stmts that are candidates for moving (just set a pass-local
>> > flag on them) and assign UIDs for the whole function after all stmts
>> > are inserted.
>>
>> The problem is we need sane UID values as reassoc is happening - not
>> after it is finished. As it process each stmt in reassoc_bb, it might
>> generate new stmts which might have to be compared in
>> find_insert_point or appears_later_in_bb.
>
> A new stmt will be created with UID 0 and in case there is stmt movement,
> you could zero the UID during movement.  Then, you can just special case
> UID zero when you are looking at UIDs.  As on trunk/4.8 gsi_for_stmt is
> O(1), you can easily walk a couple of previous and/or later stmts and look
> for the first non-zero UID around it, and treat it as if the UID was
> previous non-zero UID + 0.5 or next non-zero UID - 0.5.  And, if you see
> too many consecutive statements with UID 0 (more than some constant, 32?),
> just reassign UIDs to the whole bb.  Or you could initially assign UIDs
> with some gaps, then keep filling those gaps and once you fill them,
> renumber again with new gaps.
> Jakub

Yes, this is pretty much what I was proposing. The current
implementation doesn't rely on UIDs being unique - they only have to
be monotonically non-decreasing. So, when a UID of 0 is encountered,
go up till a non-zero UID is found and then go down and assign this
non-zero UID. This effectively implies that each UID-0 stmt is visited
at most twice. I don't think we need to check if I see more than
certain number of UID-0 stmts and redo the entire BB.

- Easwaran


Re: Fix PR middle-end/57370

2013-09-04 Thread Easwaran Raman
Submitted the patch (r202262) without the insert_stmt_after hunk. Also
fixed nits pointed out by Jakub.

- Easwaran

On Mon, Sep 2, 2013 at 2:31 AM, Richard Biener
 wrote:
> On Fri, Aug 30, 2013 at 6:13 PM, Easwaran Raman  wrote:
>> On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener
>>  wrote:
>>> On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman  wrote:
>>>> Richard,
>>>> Do you want me to commit everything but the  insert_stmt_after hunk
>>>> now?
>>>
>>> Yes.
>>>
>>>> There are more similar failures reported in
>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
>>>> the updated patch there. Shall I send that for review? Apart from the
>>>> debug statement issue, almost all the bugs are due to dependence
>>>> violation because certain newly inserted statements do not have the
>>>> right UID. Instead of trying to catch all of them, will it be better
>>>> if I check if the stmt has a proper uid (non-zero if it is not the
>>>> first stmt) and assign a sensible value at the point where it is used
>>>> (find_insert_point and appears_later_in_bb) instead of where the stmt
>>>> is created? I think that would be less brittle.
>>>
>>> In the end all this placement stuff should be reverted and done as
>>> post-processing after reassoc is finished.  You can remember the
>>> inserted stmts that are candidates for moving (just set a pass-local
>>> flag on them) and assign UIDs for the whole function after all stmts
>>> are inserted.
>>
>> The problem is we need sane UID values as reassoc is happening - not
>> after it is finished. As it process each stmt in reassoc_bb, it might
>> generate new stmts which might have to be compared in
>> find_insert_point or appears_later_in_bb.
>
> But if you no longer find_insert_point during reassoc but instead do
> a "scheduling" pass after it finished you won't need sane UIDs during
> reassoc.
>
> Richard.
>
>> - Easwaran
>>
>>> Richard.
>>>
>>>
>>>> - Easwaran
>>>>
>>>>
>>>>
>>>> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
>>>>  wrote:
>>>>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
>>>>>> I have a new patch that supersedes this. The new patch also fixes PR
>>>>>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
>>>>>> no test regression on x86_64/linux. Ok for trunk?
>>>>>>
>>>>>> 2013-07-31  Easwaran Raman  
>>>>>>
>>>>>> PR middle-end/57370
>>>>>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and 
>>>>>> reset
>>>>>> of debug statements that cause inconsistent IR.
>>>>>
>>>>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
>>>>> at all.  The other hunks are ok, but we need to work harder to preserve
>>>>> debug stmts - simply removing all is not going to fly.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> testsuite/ChangeLog:
>>>>>> 2013-07-31  Easwaran Raman  
>>>>>>
>>>>>> PR middle-end/57370
>>>>>> PR tree-optimization/57393
>>>>>>     PR tree-optimization/58011
>>>>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>>>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>>>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  
>>>>>> wrote:
>>>>>>> Ping.
>>>>>>>
>>>>>>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  
>>>>>>> wrote:
>>>>>>>> A newly generated statement in build_and_add_sum  function of
>>>>>>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>>>>>>>> statement. In one instance, it was assigned the wrong uid (of an
>>>>>>>> earlier phi statement) which messed up the IR and caused the test
>>>>>>>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>>>>>>>> Ok for trunk?
>>>>>>>>
>>>>>>&g

Re: Fix PR middle-end/57370

2013-09-05 Thread Easwaran Raman
On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
 wrote:
> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
>> I have a new patch that supersedes this. The new patch also fixes PR
>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
>> no test regression on x86_64/linux. Ok for trunk?
>>
>> 2013-07-31  Easwaran Raman  
>>
>> PR middle-end/57370
>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
>> of debug statements that cause inconsistent IR.
>
> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
> at all.  The other hunks are ok, but we need to work harder to preserve
> debug stmts - simply removing all is not going to fly.
>
> Richard.

I looked into the problem related to the debug stmts in this failing
test case in detail. Initially, the code sequence looks


s$n_13 = MEM[(struct S *)&s];
# DEBUG s$n => s$n_13
_2 = (double) s$n_13;
_4 = _2 * a_3(D);
_6 = (double) i_5(D);
_7 = _4 * _6;
_9 = (double) j_8(D);
_10 = _7 * _9;
bar (_10);
# DEBUG D#2 => (double) k_12(D)
# DEBUG D#1 => _7 * D#2
# DEBUG t$n => (int) D#1

After reassociation

s$n_13 = MEM[(struct S *)&s];
# DEBUG s$n => s$n_13
_2 = (double) s$n_13;
_6 = (double) i_5(D);
# DEBUG D#3 => _4 * _6
_9 = (double) j_8(D);
_4 = _9 * _2;
_7 = _4 * a_3(D);
_10 = _7 * _6;
bar (_10);
# DEBUG D#2 => (double) k_12(D)
# DEBUG D#1 => D#3 * D#2
# DEBUG t$n => (int) D#1

In the above, # DEBUG D#3 => _4 * _6  appears above the statement that
defines _4. But even if I move the def of D#3 below the statement
defining _4, it is not sufficient. Before reassociation, t$n refers to
the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after
reassociation it would refer to ( j_8(D) * s$n_13 *  i_5(D) *
k_12(D)). It seems the correct fix is to discard the debug temps whose
RHS contains a variable that is involved in reassociation and then
reconstruct it. Is that the expected fix here?

- Easwaran

>>
>> testsuite/ChangeLog:
>> 2013-07-31  Easwaran Raman  
>>
>> PR middle-end/57370
>> PR tree-optimization/57393
>> PR tree-optimization/58011
>> * gfortran.dg/reassoc_12.f90: New testcase.
>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>     * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>
>>
>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
>>> Ping.
>>>
>>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  wrote:
>>>> A newly generated statement in build_and_add_sum  function of
>>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>>>> statement. In one instance, it was assigned the wrong uid (of an
>>>> earlier phi statement) which messed up the IR and caused the test
>>>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>>>> Ok for trunk?
>>>>
>>>> Thanks,
>>>> Easwaran
>>>>
>>>>
>>>> 2013-06-27  Easwaran Raman  
>>>>
>>>> PR middle-end/57370
>>>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a 
>>>> phi
>>>> node for a non-phi gimple statement.
>>>>
>>>> testsuite/ChangeLog:
>>>> 2013-06-27  Easwaran Raman  
>>>>
>>>> PR middle-end/57370
>>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>>>
>>>>
>>>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
>>>> ===
>>>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>>> @@ -0,0 +1,74 @@
>>>> +! { dg-do compile }
>>>> +! { dg-options "-O2 -ffast-math" }
>>>> +! PR middle-end/57370
>>>> +
>>>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
>>>> +   grad_deriv,npoints, sx)
>>>> +IMPLICIT REAL*8 (t)
>>>> +INTEGER, PARAMETER :: dp=8
>>>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
>>>> +   e_ndrho_ndrho_rho
>>>> +  DO ii=1,npoints
>>>> +  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
>>>> +t1425 = t233 * t557
>>>> +t1429 = beta * t225
>>>> +t1622 = t327 * t1621
>>>> +t1626 = t327 * t1625
>>>> +t16

Re: Fix PR middle-end/57370

2013-09-06 Thread Easwaran Raman
On Fri, Sep 6, 2013 at 12:05 AM, Richard Biener
 wrote:
> On Thu, Sep 5, 2013 at 9:23 PM, Easwaran Raman  wrote:
>> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
>>  wrote:
>>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
>>>> I have a new patch that supersedes this. The new patch also fixes PR
>>>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
>>>> no test regression on x86_64/linux. Ok for trunk?
>>>>
>>>> 2013-07-31  Easwaran Raman  
>>>>
>>>> PR middle-end/57370
>>>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
>>>> of debug statements that cause inconsistent IR.
>>>
>>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
>>> at all.  The other hunks are ok, but we need to work harder to preserve
>>> debug stmts - simply removing all is not going to fly.
>>>
>>> Richard.
>>
>> I looked into the problem related to the debug stmts in this failing
>> test case in detail. Initially, the code sequence looks
>>
>>
>> s$n_13 = MEM[(struct S *)&s];
>> # DEBUG s$n => s$n_13
>> _2 = (double) s$n_13;
>> _4 = _2 * a_3(D);
>> _6 = (double) i_5(D);
>> _7 = _4 * _6;
>> _9 = (double) j_8(D);
>> _10 = _7 * _9;
>> bar (_10);
>> # DEBUG D#2 => (double) k_12(D)
>> # DEBUG D#1 => _7 * D#2
>> # DEBUG t$n => (int) D#1
>>
>> After reassociation
>>
>> s$n_13 = MEM[(struct S *)&s];
>> # DEBUG s$n => s$n_13
>> _2 = (double) s$n_13;
>> _6 = (double) i_5(D);
>> # DEBUG D#3 => _4 * _6
>> _9 = (double) j_8(D);
>> _4 = _9 * _2;
>> _7 = _4 * a_3(D);
>> _10 = _7 * _6;
>> bar (_10);
>> # DEBUG D#2 => (double) k_12(D)
>> # DEBUG D#1 => D#3 * D#2
>> # DEBUG t$n => (int) D#1
>>
>> In the above, # DEBUG D#3 => _4 * _6  appears above the statement that
>> defines _4. But even if I move the def of D#3 below the statement
>> defining _4, it is not sufficient. Before reassociation, t$n refers to
>> the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after
>> reassociation it would refer to ( j_8(D) * s$n_13 *  i_5(D) *
>> k_12(D)). It seems the correct fix is to discard the debug temps whose
>> RHS contains a variable that is involved in reassociation and then
>> reconstruct it. Is that the expected fix here?
>
> The value of the DEBUG expression changes because the value
> that _4 computes changes - that is a no-no that may not happen.
> We cannot re-use _4 (without previously releasing it and allocating
> it newly) with a new value.  releasing _4 should have fixed up the
> debug stmts.
>
> So - can you verify whether we are indeed just replacing the
> RHS of _4 = _2 * a_3(D) to _9 * _2 without changing the SSA name of
> that expression?

I have confirmed that the SSA name of the LHS remains the same. The
reassociation code simply calls gimple_assign_set_rhs2 to modify RHS2
and then calls update_stmt.

- Easwaran

> Another reason why re-using the same LHS for another value is
> wrong is that annotated information like points-to sets, alignment
> and soon value-range information will be wrong.
>
> Thanks,
> Richard.
>
>> - Easwaran
>>
>>>>
>>>> testsuite/ChangeLog:
>>>> 2013-07-31  Easwaran Raman  
>>>>
>>>> PR middle-end/57370
>>>> PR tree-optimization/57393
>>>> PR tree-optimization/58011
>>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>>>     * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>>>
>>>>
>>>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
>>>>> Ping.
>>>>>
>>>>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  
>>>>> wrote:
>>>>>> A newly generated statement in build_and_add_sum  function of
>>>>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>>>>>> statement. In one instance, it was assigned the wrong uid (of an
>>>>>> earlier phi statement) which messed up the IR and caused the test
>>>>>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>>>>>> Ok for trunk?
>>>>>>
>>>>>> Thanks,
>>>>>> Easwaran
>>>>>>
>>>>>>
>>>>>> 2013-06-27  Easwaran Raman  
>>

Fix PR middle-end/57393

2013-09-16 Thread Easwaran Raman
There are two separate root causes for the problem reported in PR
57393. This patch attempts to fix both.

First is due to newly created stmts that have the default UID of 0
which are compared with statements with valid UIDs leading to broken
dependences. As discussed in an earlier thread, I check the UIDs
before using them and ensure stmts have a valid UID. In the worst
case, this reassigns UIDs for the entire BB containing the stmts in
question.

The second is due to debug stmts being out of sync with the IR after
reassociation. I think the right fix is to create debug temps before
actual reassociation to decouple them from the SSA variables involved
in reassociation.

This bootstraps in x86_64 and I am running the tests. Ok for trunk?

Thanks,
Easwaran

2013-09-16  Easwaran Raman  

PR middle-end/57393
* tree-ssa-reassoc.c (get_stmt_uid_with_default): Remove.
(build_and_add_sum): Do not set UID of newly created statements.
(ensure_valid_uid): New function,
(find_insert_point): called here before UIDs are compared.
(insert_stmt_after): Do not reset debug statements.
(regenerate_debug_stmts): New function,
(reassociate_bb): use here.

testsuite/ChangeLog:
2013-09-16  Easwaran Raman  

PR middle-end/57393
* gcc.dg/tree-ssa/reassoc-32.c: New testcase.
* gcc.dg/tree-ssa/reassoc-33.c: New testcase.
* gcc.dg/tree-ssa/reassoc-34.c: New testcase.


pr57393.patch
Description: Binary data


Re: Fix PR middle-end/57393

2013-09-23 Thread Easwaran Raman
Ping.


On Mon, Sep 16, 2013 at 4:42 PM, Easwaran Raman  wrote:
> There are two separate root causes for the problem reported in PR
> 57393. This patch attempts to fix both.
>
> First is due to newly created stmts that have the default UID of 0
> which are compared with statements with valid UIDs leading to broken
> dependences. As discussed in an earlier thread, I check the UIDs
> before using them and ensure stmts have a valid UID. In the worst
> case, this reassigns UIDs for the entire BB containing the stmts in
> question.
>
> The second is due to debug stmts being out of sync with the IR after
> reassociation. I think the right fix is to create debug temps before
> actual reassociation to decouple them from the SSA variables involved
> in reassociation.
>
> This bootstraps in x86_64 and I am running the tests. Ok for trunk?
>
> Thanks,
> Easwaran
>
> 2013-09-16  Easwaran Raman  
>
> PR middle-end/57393
> * tree-ssa-reassoc.c (get_stmt_uid_with_default): Remove.
> (build_and_add_sum): Do not set UID of newly created statements.
> (ensure_valid_uid): New function,
> (find_insert_point): called here before UIDs are compared.
> (insert_stmt_after): Do not reset debug statements.
> (regenerate_debug_stmts): New function,
> (reassociate_bb): use here.
>
> testsuite/ChangeLog:
> 2013-09-16  Easwaran Raman  
>
> PR middle-end/57393
> * gcc.dg/tree-ssa/reassoc-32.c: New testcase.
> * gcc.dg/tree-ssa/reassoc-33.c: New testcase.
> * gcc.dg/tree-ssa/reassoc-34.c: New testcase.


[Google] Adjust comdat-sharing-probability param for -Os

2013-09-27 Thread Easwaran Raman
This patch increases comdat-sharing-probability to 80 under -Os. This
reduces the amount of inlining and helps internal benchmarks.
Unfortunately, this causes slight regression on spec 2006. Ok for
google branches if all tests pass?

- Easwaran


comdat_sharing.patch
Description: Binary data


Re: [Google] Adjust comdat-sharing-probability param for -Os

2013-09-27 Thread Easwaran Raman
On Fri, Sep 27, 2013 at 4:08 PM, Xinliang David Li  wrote:
> d.growth -= (info->size
> * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY))
> + 50) / 100;
>
> What is the purpose of '50' here?

Rounding after division by 100.

> The patch is fine for Google branch.
>
> Other tunings to think about -- I think the sharing probability should
> not be a fixed value -- but depending on the function's charateristics
> -- such as size, number of callsites etc. For instance, for small leaf
> comdat functions, the sharing probability will be small.
> David
>
>
> On Fri, Sep 27, 2013 at 2:57 PM, Easwaran Raman  wrote:
>> This patch increases comdat-sharing-probability to 80 under -Os. This
>> reduces the amount of inlining and helps internal benchmarks.
>> Unfortunately, this causes slight regression on spec 2006. Ok for
>> google branches if all tests pass?
>>
>> - Easwaran


Propagate profile counts during switch expansion

2012-10-02 Thread Easwaran Raman
Hi,
 This patch propagates the profile counts during RTL expansion. In
many cases, there is no way to determine the exact count of an edge
generated during the expansion. So this patch uses some simple
heuristics to estimate the edge counts but ensures that the counts of
the basic blocks corresponding to the cases are (nearly the) same as
at the gimple level.

Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for trunk?

- Easwaran

--
2012-10-02   Easwaran Raman  

* cfgbuild.c (gen_probabilities_from_existing_counts): New function.
(compute_outgoing_frequencies): If at least one successor of a
BB has non-zero profile count, use the counts to compute
probabilities.
* expr.c (do_tablejump): Add a REG_BR_PROB note on the
jump to default label.
(try_tablejump): Add a parameter to specify the probability
of jumping to the default label.
* expr.h (try_tablejump): Add a new parameter.
* stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT.
(do_jump_if_equal): Pass probability for REG_BR_PROB note.
(add_case_node): Pass execution count of the case node and use
it to initialize COUNT field.
(emit_case_decision_tree): Pass default_count to emit_case_nodes.
(get_outgoing_edge_counts): New function.
(add_prob_note_to_last_insn): Likewise.
(case_probability): New macro.
(emit_case_dispatch_table): Compute probability of jumping to default
label and apply note to the jump.
(expand_case): Compute and propagate default edge count to
emit_case_dispatch_table.
(expand_sjlj_dispatch_table): Update calls to add_case_node and
emit_case_dispatch_table.
(balance_case_nodes): Update subtree_counts.
(emit_case_nodes): Compute edge probabilities and add note.

gcc/testsuite/ChangeLog:
2012-10-02   Easwaran Raman  
* gcc.dg/tree-prof/switch-case-1.c: New test case.
* gcc.dg/tree-prof/switch-case-2.c: New test case.

Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c
===
--- gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0)
@@ -0,0 +1,40 @@
+/* { dg-options "-O2 -fdump-rtl-expand-all" } */
+int g;
+
+__attribute__((noinline)) void foo (int  n)
+{
+  switch (n)
+{
+case 1:
+  g++; break;
+case 2:
+  g += 2; break;
+case 3:
+  g += 1; break;
+case 4:
+  g += 3; break;
+case 5:
+  g += 4; break;
+case 6:
+  g += 5; break;
+case 7:
+  g += 6; break;
+case 8:
+  g += 7; break;
+case 9:
+  g += 8; break;
+default:
+  g += 8; break;
+   }
+}
+
+int main ()
+{
+ int i;
+ for (i = 0; i < 1; i++)
+   foo ((i * i) % 5);
+ return 0;
+}
+/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count
4000" 2 "expand"} } */
+/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count
2000" 1 "expand"} } */
+/* { dg-final-use { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
===
--- gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0)
@@ -0,0 +1,40 @@
+/* { dg-options "-O2 -fdump-rtl-expand-all" } */
+int g;
+
+__attribute__((noinline)) void foo (int  n)
+{
+  switch (n)
+{
+case 99:
+  g += 2; break;
+case 1:
+  g++; break;
+case 100:
+  g += 1; break;
+case 4:
+  g += 3; break;
+case 5:
+  g += 4; break;
+case 6:
+  g += 5; break;
+case 7:
+  g += 6; break;
+case 8:
+  g += 7; break;
+case 9:
+  g += 8; break;
+default:
+  g += 8; break;
+   }
+}
+
+int main ()
+{
+ int i;
+ for (i = 0; i < 1; i++)
+   foo ((i * i) % 5);
+ return 0;
+}
+/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count
4000" 2 "expand"} } */
+/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count
2000" 1 "expand"} } */
+/* { dg-final-use { cleanup-rtl-dump "expand" } } */
Index: gcc/expr.c
===
--- gcc/expr.c (revision 191879)
+++ gcc/expr.c (working copy)
@@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin
 #ifdef PUSH_ROUNDING
 static void emit_single_push_insn (enum machine_mode, rtx, tree);
 #endif
-static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
+static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
 static rtx const_vector_from_tree (tree);
 static void write_complex_part (rtx, rtx, bool);

@@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree

 static void
 do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
-  rtx default_label)
+  rtx default_label, int default_probability)
 {
   rtx temp, vector;

@@ -10910,9 +

Re: Propagate profile counts during switch expansion

2012-10-04 Thread Easwaran Raman
Hi Honza,
 I am addressing some of the questions you raise here. Will send an
updated patch later.


On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka  wrote:

> > @@ -560,7 +577,6 @@ compute_outgoing_frequencies (basic_block b)
> >return;
> >   }
> >  }
> > -
> >if (single_succ_p (b))
> >  {
> >e = single_succ_edge (b);
> > @@ -568,7 +584,10 @@ compute_outgoing_frequencies (basic_block b)
> >e->count = b->count;
> >return;
> >  }
> > -  guess_outgoing_edge_probabilities (b);
> > +  else if (!gen_probabilities_from_existing_counts (b->succs)){
> > +/* All outgoing edges of B have zero count. Guess probabilities.  */
> > +guess_outgoing_edge_probabilities (b);
> > +  }
>
> Hmm, I do not quite follow logic here.
> basic block B is one of many basic blocks that the original BB was split from.
> It is possible that B may have some of original edges, but there may be new 
> ones.
> How you can guess the outgoing probabilitie shere.  Do you have an example?

The code reaches here only when the BB has more than two successor. I
assume that this happens only when the BB is terminated by a switch
statement. During conversion to RTL, the outgoing edges of this BB and
their counts ar untouched. So I am just using those counts to compute
the edge probabilities.

> Also gen_probabilities_from_existing_counts could probably also work based
> on original edge frequencies.

Just to be clear I understand it right, you want me to use the
frequency instead of count since the frequency is derived from the
counts in profile use builds?

>
> > +
> > +  default_edge->count = default_count;
> > +  if (count)
> > +{
> > +  edge e;
> > +  edge_iterator ei;
> > +  FOR_EACH_EDGE (e, ei, stmt_bb->succs)
> > +e->probability = e->count * REG_BR_PROB_BASE / count;
> > +}
>
> Hmm, this updates origina BB containing the switch statement?
The out_edges of the original BB.
> Of course,
> modulo roundoff errors, this should hold.  I wonder where did you got the
> diferences and why do you need this?

Originally, I obtained e->probability as e->count * REG_BR_PROB_BASE /
bb->count. During profile bootstrap, I noticed BBs whose sum of
outgoing edge counts exceeded the BB's count and hence I normalized
with the sum of edge counts.

> You are going to output new control flow
> and find_many_sub_basic_blocks will recompute all the counts/frequencies 
> inside
> anyway?
Sorry, I am lost here. I am just updating the probability of
stmt_bb->succs right?


>
> > @@ -2430,7 +2523,11 @@ emit_case_nodes (rtx index, case_node_ptr node, rt
> >   unsignedp),
> > GT, NULL_RTX, mode, unsignedp,
> > label_rtx (node->right->code_label));
> > -  emit_case_nodes (index, node->left, default_label, index_type);
> > +  probability = case_probability (node->right->count,
> > +  subtree_count + 
> > default_count);
> > +  add_prob_note_to_last_insn (probability);
> > +  emit_case_nodes (index, node->left, default_label, default_count,
> > +   index_type);
>
> Hmm, here you seem to be distributing the probabilities of the counts reached.
> What happens in the case when the edge probability needs to be distributed 
> across
> nodes of the decision tree. I.e. in
> t(int a)
> {
>   switch (a)
>{
>  case 100:
>  case 200:
>  case 300:
>  case 400:
>  case 500:
>  case 600:
>  case 700:
>  case 800:
>  case 900:
>t();
>  case 101:
>  case 202:
>  case 303:
>  case 404:
>  case 505:
>  case 606:
>  case 707:
>  case 808:
>  case 909:
>q();
>}
> }
>
Ok, that's a bug in my patch. In the above example, there are two
outgoing edges from the BB containing switch but many case nodes. I
would end up multiplying the counts by the number of cases that lead
to a label. If I divide the counts of each case_node by the number of
case nodes that jump to the same label, will that be reasonable? In
the above case, if t() is called 900 times, I will assume that each of
the 9 cases contribute a count of 100.

Thanks,
Easwaran


Re: Propagate profile counts during switch expansion

2012-10-05 Thread Easwaran Raman
>
>>
>> > +
>> > +  default_edge->count = default_count;
>> > +  if (count)
>> > +{
>> > +  edge e;
>> > +  edge_iterator ei;
>> > +  FOR_EACH_EDGE (e, ei, stmt_bb->succs)
>> > +e->probability = e->count * REG_BR_PROB_BASE / count;
>> > +}
>>
>> Hmm, this updates origina BB containing the switch statement?
> The out_edges of the original BB.
>> Of course,
>> modulo roundoff errors, this should hold.  I wonder where did you got the
>> diferences and why do you need this?

The count of the outgoing edge of BB that jumps to the default label
needs to be updated (to 0 or original_default_count/2, depending on
whether default label is a jump table target). So I need to
redistribute the probabilities of the rest of the edges. That's why I
do this here.

> Originally, I obtained e->probability as e->count * REG_BR_PROB_BASE /
> bb->count. During profile bootstrap, I noticed BBs whose sum of
> outgoing edge counts exceeded the BB's count and hence I normalized
> with the sum of edge counts.
>
>> You are going to output new control flow
>> and find_many_sub_basic_blocks will recompute all the counts/frequencies 
>> inside
>> anyway?
> Sorry, I am lost here. I am just updating the probability of
> stmt_bb->succs right?
>
>


Move statements upwards after reassociation

2012-10-10 Thread Easwaran Raman
Hi,
 In the expression reassociation pass, statements might get moved
downwards to ensure that dependences are not violated after
reassociation. This can increase the live range and, in a tight loop,
result in spills.  This patch simply does a code motion of those
statements involved in reassociation and pushes them upwards in the BB
as far as possible without violating dependences. Bootstraps and no
tests regressions on a x86_64 machine running linux. Ok for trunk?


- Easwaran

---
2012-10-10  Easwaran Raman  

   * tree-ssa-reassoc.c (move_stmt_upwards): New function.
 (rewrite_expr_tree): Record statements to be moved.
 (reassociate_bb): Move statements affected by reassociation
 as early as possible.

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c (revision 191879)
+++ gcc/tree-ssa-reassoc.c (working copy)
@@ -2250,13 +2250,51 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
 }
 }

+/* Move STMT up within its BB until it can not be moved any further.  */
+
+static void move_stmt_upwards (gimple stmt)
+{
+  gimple_stmt_iterator gsi, gsistmt;
+  tree rhs1, rhs2;
+  gimple rhs1def = NULL, rhs2def = NULL;
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+  gcc_assert (rhs1);
+  if (TREE_CODE (rhs1) == SSA_NAME)
+rhs1def = SSA_NAME_DEF_STMT (rhs1);
+  else if (TREE_CODE (rhs1) != REAL_CST
+   && TREE_CODE (rhs1) != INTEGER_CST)
+return;
+  if (rhs2)
+{
+  if (TREE_CODE (rhs2) == SSA_NAME)
+rhs2def = SSA_NAME_DEF_STMT (rhs2);
+  else if (TREE_CODE (rhs1) != REAL_CST
+   && TREE_CODE (rhs1) != INTEGER_CST)
+return;
+}
+  gsi = gsi_for_stmt (stmt);
+  gsistmt = gsi;
+  gsi_prev (&gsi);
+  for (; !gsi_end_p (gsi); gsi_prev (&gsi))
+{
+  gimple curr_stmt = gsi_stmt (gsi);
+  if (curr_stmt == rhs1def || curr_stmt == rhs2def) {
+gsi_move_after (&gsistmt, &gsi);
+return;
+  }
+}
+
+}
+
 /* Recursively rewrite our linearized statements so that the operators
match those in OPS[OPINDEX], putting the computation in rank
order.  */

 static void
 rewrite_expr_tree (gimple stmt, unsigned int opindex,
-   VEC(operand_entry_t, heap) * ops, bool moved)
+   VEC(operand_entry_t, heap) * ops, bool moved,
+   VEC(gimple, heap) **stmts_to_move)
 {
   tree rhs1 = gimple_assign_rhs1 (stmt);
   tree rhs2 = gimple_assign_rhs2 (stmt);
@@ -2299,6 +2337,7 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
   print_gimple_stmt (dump_file, stmt, 0, 0);
 }
  }
+  VEC_safe_push (gimple, heap, *stmts_to_move, stmt);
   return;
 }

@@ -2346,7 +2385,9 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
 }
   /* Recurse on the LHS of the binary operator, which is guaranteed to
  be the non-leaf side.  */
-  rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved);
+  rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved,
+ stmts_to_move);
+  VEC_safe_push (gimple, heap, *stmts_to_move, stmt);
 }

 /* Find out how many cycles we need to compute statements chain.
@@ -3427,6 +3468,9 @@ reassociate_bb (basic_block bb)
 {
   gimple_stmt_iterator gsi;
   basic_block son;
+  VEC(gimple, heap) *stmts_to_move = NULL;
+  gimple stmt;
+  int i;

   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
 {
@@ -3542,7 +3586,7 @@ reassociate_bb (basic_block bb)
   && VEC_length (operand_entry_t, ops) > 3)
 rewrite_expr_tree_parallel (stmt, width, ops);
   else
-rewrite_expr_tree (stmt, 0, ops, false);
+rewrite_expr_tree (stmt, 0, ops, false, &stmts_to_move);

   /* If we combined some repeated factors into a
  __builtin_powi call, multiply that result by the
@@ -3560,6 +3604,7 @@ reassociate_bb (basic_block bb)
target_ssa);
   gimple_set_location (mul_stmt, gimple_location (stmt));
   gsi_insert_after (&gsi, mul_stmt, GSI_NEW_STMT);
+  VEC_safe_push (gimple, heap, stmts_to_move, mul_stmt);
 }
  }

@@ -3567,6 +3612,11 @@ reassociate_bb (basic_block bb)
 }
  }
 }
+
+  FOR_EACH_VEC_ELT (gimple, stmts_to_move, i, stmt)
+move_stmt_upwards (stmt);
+  VEC_free (gimple, heap, stmts_to_move);
+
   for (son = first_dom_son (CDI_POST_DOMINATORS, bb);
son;
son = next_dom_son (CDI_POST_DOMINATORS, son))


[google] Move delete with size to its own file (issue6655052)

2012-10-11 Thread Easwaran Raman
This patch moves the two argument delete operator into its own file.
When a program provides its own definition of operator delete (void
*), but not operator delete (void *, size_t), we could end up linking
two files that define _ZdlPv resulting in a linker error. Bootstraps
with no test regressions on a x86_64 machine running linux. Ok for
google/4_7 and google/main branches?

Google ref b/6982747

2012-10-11  Easwaran Raman  

* libsupc++/Makefile.am: Add del_opsz.cc to sources.
* libsupc++/Makefile.in: Regenerated.
* libsupc++/del_opsz.cc: New file.
* libsupc++/del_op.cc(operator delete(void*,std::size_t)):
  Remove.

Index: libstdc++-v3/libsupc++/Makefile.in
===
--- libstdc++-v3/libsupc++/Makefile.in  (revision 192373)
+++ libstdc++-v3/libsupc++/Makefile.in  (working copy)
@@ -92,8 +92,8 @@ LTLIBRARIES = $(noinst_LTLIBRARIES) $(toolexeclib_
 libsupc___la_LIBADD =
 am__objects_1 = array_type_info.lo atexit_arm.lo bad_alloc.lo \
bad_cast.lo bad_typeid.lo class_type_info.lo del_op.lo \
-   del_opnt.lo del_opv.lo del_opvnt.lo dyncast.lo eh_alloc.lo \
-   eh_arm.lo eh_aux_runtime.lo eh_call.lo eh_catch.lo \
+   del_opsz.lo del_opnt.lo del_opv.lo del_opvnt.lo dyncast.lo \
+   eh_alloc.lo eh_arm.lo eh_aux_runtime.lo eh_call.lo eh_catch.lo \
eh_exception.lo eh_globals.lo eh_personality.lo eh_ptr.lo \
eh_term_handler.lo eh_terminate.lo eh_tm.lo eh_throw.lo \
eh_type.lo eh_unex_handler.lo enum_type_info.lo \
@@ -362,6 +362,7 @@ sources = \
bad_typeid.cc \
class_type_info.cc \
del_op.cc \
+   del_opsz.cc \
del_opnt.cc \
del_opv.cc \
del_opvnt.cc \
Index: libstdc++-v3/libsupc++/del_op.cc
===
--- libstdc++-v3/libsupc++/del_op.cc(revision 192373)
+++ libstdc++-v3/libsupc++/del_op.cc(working copy)
@@ -47,11 +47,3 @@ operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT
   if (ptr)
 std::free(ptr);
 }
-
-_GLIBCXX_WEAK_DEFINITION void
-operator delete(void* ptr,
-std::size_t bytes __attribute__((__unused__))) throw ()
-{
-  if (ptr)
-std::free(ptr);
-}
Index: libstdc++-v3/libsupc++/del_opsz.cc
===
--- libstdc++-v3/libsupc++/del_opsz.cc  (revision 0)
+++ libstdc++-v3/libsupc++/del_opsz.cc  (revision 0)
@@ -0,0 +1,50 @@
+// Boilerplate support routines for -*- C++ -*- dynamic memory management.
+
+// Copyright (C) 2012
+// Free Software Foundation
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#include 
+
+#if !_GLIBCXX_HOSTED
+// A freestanding C runtime may not provide "free" -- but there is no
+// other reasonable way to implement "operator delete".
+namespace std
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+  extern "C" void free(void*);
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace
+#else
+# include 
+#endif
+
+#include "new"
+
+_GLIBCXX_WEAK_DEFINITION void
+operator delete(void* ptr,
+std::size_t bytes __attribute__((__unused__))) throw ()
+{
+  if (ptr)
+std::free(ptr);
+}
Index: libstdc++-v3/libsupc++/Makefile.am
===
--- libstdc++-v3/libsupc++/Makefile.am  (revision 192373)
+++ libstdc++-v3/libsupc++/Makefile.am  (working copy)
@@ -53,6 +53,7 @@ sources = \
bad_typeid.cc \
class_type_info.cc \
del_op.cc \
+   del_opsz.cc \
del_opnt.cc \
del_opv.cc \
del_opvnt.cc \

--
This patch is available for review at http://codereview.appspot.com/6655052


Re: Move statements upwards after reassociation

2012-10-11 Thread Easwaran Raman
Thanks for the comments. As David wrote, the intent of the patch is
not to do a general purpose scheduling, but to compensate for the
possible live range lengthening introduced by reassociation.


On Thu, Oct 11, 2012 at 6:16 AM, Richard Biener
 wrote:
> On Thu, Oct 11, 2012 at 3:52 AM, Easwaran Raman  wrote:
>>
>> +/* Move STMT up within its BB until it can not be moved any further.  */
>> +
>> +static void move_stmt_upwards (gimple stmt)
>> +{
>> +  gimple_stmt_iterator gsi, gsistmt;
>> +  tree rhs1, rhs2;
>> +  gimple rhs1def = NULL, rhs2def = NULL;
>> +  rhs1 = gimple_assign_rhs1 (stmt);
>> +  rhs2 = gimple_assign_rhs2 (stmt);
>> +  gcc_assert (rhs1);
>
> Please no such senseless asserts.  The following line will segfault anyway
> if rhs1 is NULL and with a debugger an assert doesn't add any useful
> information.
Ok.
>
>> +  if (TREE_CODE (rhs1) == SSA_NAME)
>> +rhs1def = SSA_NAME_DEF_STMT (rhs1);
>> +  else if (TREE_CODE (rhs1) != REAL_CST
>> +   && TREE_CODE (rhs1) != INTEGER_CST)
>> +return;
>> +  if (rhs2)
>
> You may not access gimple_assign_rhs2 if it is not there.  So you have
> to check whether you have an unary, binary or ternary (yes) operation.

gimple_assign_rhs2 returns NULL_TREE if it the RHS of an assignment
has less than two operands.  Regarding the check for ternary
operation, I believe it is not necessary. A statement is considered
for reassociation only if the RHS class is GIMPLE_BINARY_RHS.
Subsequently, for rhs1 and rhs2, it checks if the def statements also
have the same code and so it seems to me that a statement with a
ternary operator in the RHS will never be considered in
rewrite_expr_tree.


>
>> +{
>> +  if (TREE_CODE (rhs2) == SSA_NAME)
>> +rhs2def = SSA_NAME_DEF_STMT (rhs2);
>> +  else if (TREE_CODE (rhs1) != REAL_CST
>> +   && TREE_CODE (rhs1) != INTEGER_CST)
>> +return;
>> +}
>> +  gsi = gsi_for_stmt (stmt);
>> +  gsistmt = gsi;
>> +  gsi_prev (&gsi);
>> +  for (; !gsi_end_p (gsi); gsi_prev (&gsi))
>
> 
>
> This doesn't make much sense.  You can move a stmt to the nearest
> common post-dominator.  Assuming you only want to handle placing
> it after rhs1def or after rhs2def(?) you don't need any loop, just
> two dominator queries and an insertion after one of the definition
> stmts.

Within a BB isn't that still O(size of BB)?

> But this code should consider BBs.
For reassociation to look across BBs, the code should look something like this:

L1 :
   a_2 = a_1 + 10
   jc L3
L2:
  a_3 = a_2 + 20

- L1 should dominate L2 (otherwise there will be a phi node at L2 and
the reassociation of a_3 will not consider the definition of a_2)
- There are no other uses of a_2 other than the one in L3.

After reassociation, the stmt defining a_2 would be moved to L2.  In
that case, the downward code motion of a_2 = a_1 + 10 to L2 is
beneficial (one less instruction if the branch is taken). It is not
obvious to me that moving  it to L1 (or whereever a_1 is defined) is
beneficial.

>  And I don't see why more optimal
> placement cannot be done during rewrite_expr_tree itself.

I started with that idea, but my current approach looks more simpler.


Thanks,
Easwaran
>
> NB, the whole reassoc code needs a re-write to avoid the excessive
> stmt modifications when it does nothing.  So I'd very much rather avoid
> adding anything to reassoc until that rewrite happened.
>
> Richard.
>
>> +{
>> +  gimple curr_stmt = gsi_stmt (gsi);
>> +  if (curr_stmt == rhs1def || curr_stmt == rhs2def) {
>> +gsi_move_after (&gsistmt, &gsi);
>> +return;
>> +  }
>> +}
>> +
>> +}
>> +
>>  /* Recursively rewrite our linearized statements so that the operators
>> match those in OPS[OPINDEX], putting the computation in rank
>> order.  */
>>
>>  static void
>>  rewrite_expr_tree (gimple stmt, unsigned int opindex,
>> -   VEC(operand_entry_t, heap) * ops, bool moved)
>> +   VEC(operand_entry_t, heap) * ops, bool moved,
>> +   VEC(gimple, heap) **stmts_to_move)
>>  {
>>tree rhs1 = gimple_assign_rhs1 (stmt);
>>tree rhs2 = gimple_assign_rhs2 (stmt);
>> @@ -2299,6 +2337,7 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
>>print_gimple_stmt (dump_file, stmt, 0, 0);
>>  }
>>   }
>> +  VEC_safe_push (gimple, heap, *stmts_to_move, stmt);
>>return;
>>  }
>>
>> @@ -2346,7 +2385,9 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
>>  }
>>/* Recurse on t

Re: Move statements upwards after reassociation

2012-10-12 Thread Easwaran Raman
On Fri, Oct 12, 2012 at 1:45 AM, Richard Biener
 wrote:
> On Fri, Oct 12, 2012 at 3:09 AM, Easwaran Raman  wrote:
>> Thanks for the comments. As David wrote, the intent of the patch is
>> not to do a general purpose scheduling, but to compensate for the
>> possible live range lengthening introduced by reassociation.
>>
>>
>> On Thu, Oct 11, 2012 at 6:16 AM, Richard Biener
>>  wrote:
>>> On Thu, Oct 11, 2012 at 3:52 AM, Easwaran Raman  wrote:
>>>>
>>>> +/* Move STMT up within its BB until it can not be moved any further.  */
>>>> +
>>>> +static void move_stmt_upwards (gimple stmt)
>>>> +{
>>>> +  gimple_stmt_iterator gsi, gsistmt;
>>>> +  tree rhs1, rhs2;
>>>> +  gimple rhs1def = NULL, rhs2def = NULL;
>>>> +  rhs1 = gimple_assign_rhs1 (stmt);
>>>> +  rhs2 = gimple_assign_rhs2 (stmt);
>>>> +  gcc_assert (rhs1);
>>>
>>> Please no such senseless asserts.  The following line will segfault anyway
>>> if rhs1 is NULL and with a debugger an assert doesn't add any useful
>>> information.
>> Ok.
>>>
>>>> +  if (TREE_CODE (rhs1) == SSA_NAME)
>>>> +rhs1def = SSA_NAME_DEF_STMT (rhs1);
>>>> +  else if (TREE_CODE (rhs1) != REAL_CST
>>>> +   && TREE_CODE (rhs1) != INTEGER_CST)
>>>> +return;
>>>> +  if (rhs2)
>>>
>>> You may not access gimple_assign_rhs2 if it is not there.  So you have
>>> to check whether you have an unary, binary or ternary (yes) operation.
>>
>> gimple_assign_rhs2 returns NULL_TREE if it the RHS of an assignment
>> has less than two operands.  Regarding the check for ternary
>> operation, I believe it is not necessary. A statement is considered
>> for reassociation only if the RHS class is GIMPLE_BINARY_RHS.
>> Subsequently, for rhs1 and rhs2, it checks if the def statements also
>> have the same code and so it seems to me that a statement with a
>> ternary operator in the RHS will never be considered in
>> rewrite_expr_tree.
>>
>>
>>>
>>>> +{
>>>> +  if (TREE_CODE (rhs2) == SSA_NAME)
>>>> +rhs2def = SSA_NAME_DEF_STMT (rhs2);
>>>> +  else if (TREE_CODE (rhs1) != REAL_CST
>>>> +   && TREE_CODE (rhs1) != INTEGER_CST)
>>>> +return;
>>>> +}
>>>> +  gsi = gsi_for_stmt (stmt);
>>>> +  gsistmt = gsi;
>>>> +  gsi_prev (&gsi);
>>>> +  for (; !gsi_end_p (gsi); gsi_prev (&gsi))
>>>
>>> 
>>>
>>> This doesn't make much sense.  You can move a stmt to the nearest
>>> common post-dominator.  Assuming you only want to handle placing
>>> it after rhs1def or after rhs2def(?) you don't need any loop, just
>>> two dominator queries and an insertion after one of the definition
>>> stmts.
>>
>> Within a BB isn't that still O(size of BB)?
>
> Please document the fact that both stmts are in the same BB.
Ok.
> And no, it isn't, it is O (size of BB ^ 2).  You don't need a loop.
> operand rank should reflect the dominance relation inside the BB.

The rank of phi definitions would mess this up.

> If that doesn't work simply assign UIDs to the stmts first.
Ok.

>
>>> But this code should consider BBs.
>> For reassociation to look across BBs, the code should look something like 
>> this:
>>
>> L1 :
>>a_2 = a_1 + 10
>>jc L3
>> L2:
>>   a_3 = a_2 + 20
>>
>> - L1 should dominate L2 (otherwise there will be a phi node at L2 and
>> the reassociation of a_3 will not consider the definition of a_2)
>> - There are no other uses of a_2 other than the one in L3.
>>
>> After reassociation, the stmt defining a_2 would be moved to L2.  In
>> that case, the downward code motion of a_2 = a_1 + 10 to L2 is
>> beneficial (one less instruction if the branch is taken). It is not
>> obvious to me that moving  it to L1 (or whereever a_1 is defined) is
>> beneficial.
>
> In this case it doesn't matter whether a1 lives through a3 or if a2 does.
> But moving the stmt is not necessary, so why not simply avoid it.
I used that example to show that the current downward motion has a
useful side effect and this patch preserves it. Yes, in this example
the downward motion can be avoided but in general it may not be
possible. I agree with you that there is unnecessary code motion in
many cases.

> You cannot undo it with yout patch anyway.
>
>>>  And I do

Re: Propagate profile counts during switch expansion

2012-10-13 Thread Easwaran Raman
Ping.


On Mon, Oct 8, 2012 at 5:46 PM, Easwaran Raman  wrote:
> I have attached a revised patch. The updated ChangeLog is given below
> and I have responded to your comments inline:
>
> 2012-10-08   Easwaran Raman  
> * optabs.c (emit_cmp_and_jump_insn_1): Add a new parameter to
> specificy the probability of taking the jump.
> (emit_cmp_and_jump_insns): Likewise.
> (expand_compare_and_swap_loop): Make the jump predicted not taken.
> * dojump.c (do_compare_rtx_and_jump): Remove the code attaching
> REG_BR_PROB note and pass probability to emit_cmp_and_jump_insns.
> * cfgbuild.c (compute_outgoing_frequencies): Do not guess outgoing
> probabilities for branches with more than two successors.
> * expr.c (emit_block_move_via_loop): Predict the loop backedge loop
> to be highly taken.
> (try_casesi): Pass the probability of jumping to the default label.
> (try_tablejump): Likewise.
> (do_tablejump): Likewise.
> * expr.h (try_tablejump): Add a new parameter.
> (try_casesi): Likewise.
> (emit_cmp_and_jump_insns): Add probability as default parameter with a
> default value of -1.
> * except.c (sjlj_emit_function_enter): Pass probability to
> emit_cmp_and_jump_insns.
> * stmt.c (case_node): Add new fields PROB and SUBTREE_PROB.
> (do_jump_if_equal): Pass probability for REG_BR_PROB note.
> (add_case_node): Pass estimated probability of jumping to the case
> label.
> (emit_case_decision_tree): Pass default_prob to emit_case_nodes.
> (get_outgoing_edge_probs): New function.
> (conditional_probability): Likewise.
> (reset_out_edges_aux): Likewise.
> (compute_cases_per_edge): Likewise.
> (emit_case_dispatch_table): Update probabilities of edges coming out
> of the switch statement.
> (expand_case): Compute and propagate default edge probability to
> emit_case_dispatch_table.
> (expand_sjlj_dispatch_table): Update calls to add_case_node and
> emit_case_dispatch_table.
> (balance_case_nodes): Update subtree_prob values.
> (emit_case_nodes): Compute edge probabilities and add pass them to
> emit_cmp_and_jump_insns.
>
> gcc/testsuite/ChangeLog:
> 2012-10-02   Easwaran Raman  
> * gcc.dg/tree-prof/switch-case-1.c: New test case.
> * gcc.dg/tree-prof/switch-case-2.c: New test case.
>
>
>
>
> On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka  wrote:
>>> Hi,
>>>  This patch propagates the profile counts during RTL expansion. In
>>> many cases, there is no way to determine the exact count of an edge
>>> generated during the expansion. So this patch uses some simple
>>> heuristics to estimate the edge counts but ensures that the counts of
>>> the basic blocks corresponding to the cases are (nearly the) same as
>>> at the gimple level.
>>>
>>> Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for 
>>> trunk?
>>> Index: gcc/expr.c
>>> ===
>>> --- gcc/expr.c (revision 191879)
>>> +++ gcc/expr.c (working copy)
>>> @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin
>>>  #ifdef PUSH_ROUNDING
>>>  static void emit_single_push_insn (enum machine_mode, rtx, tree);
>>>  #endif
>>> -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
>>> +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
>>>  static rtx const_vector_from_tree (tree);
>>>  static void write_complex_part (rtx, rtx, bool);
>>>
>>> @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree
>>>
>>>  static void
>>>  do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx 
>>> table_label,
>>> -  rtx default_label)
>>> +  rtx default_label, int default_probability)
>>
>> Please document default_probability.
> Done.
>
>>>  {
>>>rtx temp, vector;
>>>
>>> @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r
>>>   the maximum value of the range.  */
>>>
>>>if (default_label)
>>> -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
>>> - default_label);
>>> +{
>>> +  emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
>>> +   default_label);
>>> +  if (default_probability != -1)
>>> +{
>>> +  rtx jump_insn = get_last_insn();
>>> +  add_reg_note (jump_insn, REG_BR_PROB, GEN_INT 
>>> (default_probability));
>>> +}
>>> +}
>>
>> dojump already does this kind of logic, but it is

Re: Propagate profile counts during switch expansion

2012-10-15 Thread Easwaran Raman
On Sun, Oct 14, 2012 at 8:09 AM, Jan Hubicka  wrote:
> Hi,
>
> Index: optabs.c
> ===
> --- optabs.c(revision 191879)
> +++ optabs.c(working copy)
> @@ -4249,7 +4249,7 @@ prepare_operand (enum insn_code icode, rtx x, int
> we can do the branch.  */
>
>  static void
> -emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label)
> +emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label, int 
> prob)
>  {
>enum machine_mode optab_mode;
>enum mode_class mclass;
> @@ -4261,7 +4261,16 @@ static void
>
>gcc_assert (icode != CODE_FOR_nothing);
>gcc_assert (insn_operand_matches (icode, 0, test));
> -  emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), 
> label));
> +  rtx insn = emit_insn (
> +  GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), label));
>
> I think we did not change to style of mixing declaration and code yet.  So
> please put declaration ahead.
Ok.

>
> I think you want to keep emit_jump_insn.  Also do nothing when profile_status
> == PROFILE_ABSENT.

Why should this be dependent on profile_status? The PROB passed could
also come from static prediction right.

> Index: cfgbuild.c
> ===
> --- cfgbuild.c  (revision 191879)
> +++ cfgbuild.c  (working copy)
> @@ -559,8 +559,11 @@ compute_outgoing_frequencies (basic_block b)
>   f->count = b->count - e->count;
>   return;
> }
> +  else
> +{
> +  guess_outgoing_edge_probabilities (b);
> +}
>
> Add comment here that we rely on multiway BBs having sane probabilities 
> already.
> You still want to do guessing when the edges out are EH. Those also can be 
> many.
I think this should work:

-  if (single_succ_p (b))
+  else if (single_succ_p (b))
 {
   e = single_succ_edge (b);
   e->probability = REG_BR_PROB_BASE;
   e->count = b->count;
   return;
 }
-  guess_outgoing_edge_probabilities (b);
+  else
+{
+  /* We rely on BBs with more than two successors to have sane
probabilities
+ and do not guess them here. For BBs terminated by switch statements
+ expanded to jump-table jump, we have done the right thing during
+ expansion. For EH edges, we still guess the probabilities here.  */
+  bool complex_edge = false;
+  FOR_EACH_EDGE (e, ei, b->succs)
+if (e->flags & EDGE_COMPLEX)
+  {
+complex_edge = true;
+break;
+  }
+  if (complex_edge)
+guess_outgoing_edge_probabilities (b);
+}
+


> Index: expr.h
> ===
> --- expr.h  (revision 191879)
> +++ expr.h  (working copy)
> @@ -190,7 +190,7 @@ extern int have_sub2_insn (rtx, rtx);
>  /* Emit a pair of rtl insns to compare two rtx's and to jump
> to a label if the comparison is true.  */
>  extern void emit_cmp_and_jump_insns (rtx, rtx, enum rtx_code, rtx,
> -enum machine_mode, int, rtx);
> +enum machine_mode, int, rtx, int 
> prob=-1);
>
> Hmm, probably first appreance of this C++ construct. I suppose it is OK.
http://gcc.gnu.org/codingconventions.html#Default says it is ok for POD values.

>
> +static inline void
> +reset_out_edges_aux (basic_block bb)
> +{
> +  edge e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE(e, ei, bb->succs)
> +e->aux = (void *)0;
> +}
> +static inline void
> +compute_cases_per_edge (gimple stmt)
> +{
> +  basic_block bb = gimple_bb (stmt);
> +  reset_out_edges_aux (bb);
> +  int ncases = gimple_switch_num_labels (stmt);
> +  for (int i = ncases - 1; i >= 1; --i)
> +{
> +  tree elt = gimple_switch_label (stmt, i);
> +  tree lab = CASE_LABEL (elt);
> +  basic_block case_bb = label_to_block_fn (cfun, lab);
> +  edge case_edge = find_edge (bb, case_bb);
> +  case_edge->aux = (void *)((long)(case_edge->aux) + 1);
> +}
> +}
>
> Comments and newlines per coding standard.
Ok.

> With the these changes, the patch is OK

Thanks,
Easwaran
>
> Thanks,
> Honza


Fix bugs introduced by switch-case profile propagation

2012-10-17 Thread Easwaran Raman
Hi,
 This patch fixes bugs introduced by my previous patch to propagate
profiles during switch expansion. Bootstrap and profiledbootstrap
successful on x86_64. Confirmed that it fixes the crashes reported in
PR middle-end/54957. OK for trunk?

- Easwaran

2012-10-17   Easwaran Raman  

PR target/54938
PR middle-end/54957
* optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
only if it doesn't already exist.
* except.c (sjlj_emit_function_enter): Remove unused variable.
* stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
(emit_case_dispatch_table): Handle the case where STMT_BB is
NULL.
(expand_sjlj_dispatch_table): Pass BB containing before_case
to emit_case_dispatch_table.

Index: gcc/optabs.c
===
--- gcc/optabs.c (revision 192488)
+++ gcc/optabs.c (working copy)
@@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m
   && profile_status != PROFILE_ABSENT
   && insn
   && JUMP_P (insn)
-  && any_condjump_p (insn))
-{
-  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
-  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
-}
+  && any_condjump_p (insn)
+  && !find_reg_note (insn, REG_BR_PROB, 0))
+add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
 }

 /* Generate code to compare X with Y so that the condition codes are
Index: gcc/except.c
===
--- gcc/except.c (revision 192488)
+++ gcc/except.c (working copy)
@@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label)
   if (dispatch_label)
 {
 #ifdef DONT_USE_BUILTIN_SETJMP
-  rtx x, last;
+  rtx x;
   x = emit_library_call_value (setjmp_libfunc, NULL_RTX, LCT_RETURNS_TWICE,
TYPE_MODE (integer_type_node), 1,
plus_constant (Pmode, XEXP (fc, 0),
Index: gcc/stmt.c
===
--- gcc/stmt.c (revision 192488)
+++ gcc/stmt.c (working copy)
@@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb)
   edge e;
   edge_iterator ei;
   int prob_sum = 0;
+  if (!bb)
+return 0;
   FOR_EACH_EDGE(e, ei, bb->succs)
 prob_sum += e->probability;
   return prob_sum;
@@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in
   rtx fallback_label = label_rtx (case_list->code_label);
   rtx table_label = gen_label_rtx ();
   bool has_gaps = false;
-  edge default_edge = EDGE_SUCC(stmt_bb, 0);
-  int default_prob = default_edge->probability;
+  edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL;
+  int default_prob = default_edge ? default_edge->probability : 0;
   int base = get_outgoing_edge_probs (stmt_bb);
   bool try_with_tablejump = false;

@@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in
   default_prob = 0;
 }

-  default_edge->probability = default_prob;
+  if (default_edge)
+default_edge->probability = default_prob;

   /* We have altered the probability of the default edge. So the probabilities
  of all other edges need to be adjusted so that it sums up to
@@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index,

   emit_case_dispatch_table (index_expr, index_type,
  case_list, default_label,
- minval, maxval, range, NULL);
+ minval, maxval, range,
+BLOCK_FOR_INSN (before_case));
   emit_label (default_label);
   free_alloc_pool (case_node_pool);
 }


Minimize downward code motion during reassociation

2012-10-18 Thread Easwaran Raman
Hi,

During expression reassociation, statements are conservatively moved
downwards to ensure that dependences are correctly satisfied after
reassocation. This could lead to lengthening of live ranges. This
patch moves statements only to the extent necessary. Bootstraps and no
test regression on x86_64/linux. OK for trunk?

Thanks,
Easwaran

2012-10-18   Easwaran Raman  
* tree-ssa-reassoc.c(assign_uids): New function.
(assign_uids_in_relevant_bbs): Likewise.
(ensure_ops_are_available): Likewise.
(rewrite_expr_tree): Do not move statements beyond what is
necessary. Remove call to swap_ops_for_binary_stmt...
(reassociate_bb): ... and move it here.



Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c (revision 192487)
+++ gcc/tree-ssa-reassoc.c (working copy)
@@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
 }
 }

+/* Assign UIDs to statements in basic block BB.  */
+
+static void
+assign_uids (basic_block bb)
+{
+  unsigned uid = 0;
+  gimple_stmt_iterator gsi;
+  /* First assign uids to phis.  */
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+  gimple_set_uid (stmt, uid++);
+}
+
+  /* Then assign uids to stmts.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+  gimple_set_uid (stmt, uid++);
+}
+}
+
+/* For each operand in OPS, find the basic block that contains the statement
+   which defines the operand. For all such basic blocks, assign UIDs.  */
+
+static void
+assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
+{
+  operand_entry_t oe;
+  int i;
+  struct pointer_set_t *seen_bbs = pointer_set_create ();
+
+  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
+{
+  gimple def_stmt;
+  basic_block bb;
+  if (TREE_CODE (oe->op) != SSA_NAME)
+continue;
+  def_stmt = SSA_NAME_DEF_STMT (oe->op);
+  bb = gimple_bb (def_stmt);
+  if (!pointer_set_contains (seen_bbs, bb))
+{
+  assign_uids (bb);
+  pointer_set_insert (seen_bbs, bb);
+}
+}
+  pointer_set_destroy (seen_bbs);
+}
+
+/* Ensure that operands in the OPS vector starting from OPINDEXth
entry are live
+   at STMT.  This is accomplished by moving STMT if needed.  */
+
+static void
+ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
ops, int opindex)
+{
+  int i;
+  int len = VEC_length (operand_entry_t, ops);
+  gimple insert_stmt = stmt;
+  basic_block insert_bb = gimple_bb (stmt);
+  gimple_stmt_iterator gsi_insert, gsistmt;
+  for (i = opindex; i < len; i++)
+{
+  operand_entry_t oe = VEC_index (operand_entry_t, ops, i);
+  gimple def_stmt;
+  basic_block def_bb;
+  /* Ignore constants and operands with default definitons.  */
+  if (TREE_CODE (oe->op) != SSA_NAME
+  || SSA_NAME_IS_DEFAULT_DEF (oe->op))
+continue;
+  def_stmt = SSA_NAME_DEF_STMT (oe->op);
+  def_bb = gimple_bb (def_stmt);
+  if (def_bb != insert_bb
+  && !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb))
+{
+  insert_bb = def_bb;
+  insert_stmt = def_stmt;
+}
+  else if (def_bb == insert_bb
+   && gimple_uid (insert_stmt) < gimple_uid (def_stmt))
+insert_stmt = def_stmt;
+}
+  if (insert_stmt == stmt)
+return;
+  gsistmt = gsi_for_stmt (stmt);
+  /* If GSI_STMT is a phi node, then do not insert just after that statement.
+ Instead, find the first non-label gimple statement in BB and insert before
+ that.  */
+  if (gimple_code (insert_stmt) == GIMPLE_PHI)
+{
+  gsi_insert = gsi_after_labels (insert_bb);
+  gsi_move_before (&gsistmt, &gsi_insert);
+}
+  /* Statements marked for throw can not be in the middle of a basic block. So
+ we can not insert a statement (not marked for throw) immediately
after.  */
+  else if (lookup_stmt_eh_lp (insert_stmt) > 0
+   && stmt_can_throw_internal (insert_stmt))
+{
+  edge e, succ_edge = NULL;
+  edge_iterator ei;
+
+  /* There should be exactly one normal edge out of the basic block.  */
+  FOR_EACH_EDGE (e, ei, insert_bb->succs)
+{
+  if (!(e->flags & EDGE_COMPLEX))
+{
+  gcc_assert (succ_edge == NULL);
+  succ_edge = e;
+}
+}
+  /* Insert STMT at the beginning of the successor basic block.  */
+  insert_bb = succ_edge->dest;
+  gsi_insert = gsi_after_labels (insert_bb);
+  gsi_move_before (&gsistmt, &gsi_insert);
+}
+  else
+{
+  gsi_insert = gsi_for_stmt (insert_stmt);
+  gsi_move_after (&gsistmt, &gsi_insert);
+}
+
+}
+
 /* Recursively rewrite our linearized statements so that the operators
ma

Re: Minimize downward code motion during reassociation

2012-10-19 Thread Easwaran Raman
On Fri, Oct 19, 2012 at 5:13 PM, Xinliang David Li  wrote:
> On Thu, Oct 18, 2012 at 3:36 PM, Easwaran Raman  wrote:
>> Hi,
>>
>> During expression reassociation, statements are conservatively moved
>> downwards to ensure that dependences are correctly satisfied after
>> reassocation. This could lead to lengthening of live ranges. This
>> patch moves statements only to the extent necessary. Bootstraps and no
>> test regression on x86_64/linux. OK for trunk?
>>
>> Thanks,
>> Easwaran
>>
>> 2012-10-18   Easwaran Raman  
>> * tree-ssa-reassoc.c(assign_uids): New function.
>> (assign_uids_in_relevant_bbs): Likewise.
>> (ensure_ops_are_available): Likewise.
>> (rewrite_expr_tree): Do not move statements beyond what is
>> necessary. Remove call to swap_ops_for_binary_stmt...
>> (reassociate_bb): ... and move it here.
>>
>>
>>
>> Index: gcc/tree-ssa-reassoc.c
>> ===
>> --- gcc/tree-ssa-reassoc.c (revision 192487)
>> +++ gcc/tree-ssa-reassoc.c (working copy)
>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
>>  }
>>  }
>>
>> +/* Assign UIDs to statements in basic block BB.  */
>> +
>> +static void
>> +assign_uids (basic_block bb)
>> +{
>> +  unsigned uid = 0;
>> +  gimple_stmt_iterator gsi;
>> +  /* First assign uids to phis.  */
>> +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +{
>> +  gimple stmt = gsi_stmt (gsi);
>> +  gimple_set_uid (stmt, uid++);
>> +}
>> +
>> +  /* Then assign uids to stmts.  */
>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +{
>> +  gimple stmt = gsi_stmt (gsi);
>> +  gimple_set_uid (stmt, uid++);
>> +}
>> +}
>> +
>> +/* For each operand in OPS, find the basic block that contains the statement
>> +   which defines the operand. For all such basic blocks, assign UIDs.  */
>> +
>> +static void
>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
>> +{
>> +  operand_entry_t oe;
>> +  int i;
>> +  struct pointer_set_t *seen_bbs = pointer_set_create ();
>> +
>> +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
>> +{
>> +  gimple def_stmt;
>> +  basic_block bb;
>> +  if (TREE_CODE (oe->op) != SSA_NAME)
>> +continue;
>> +  def_stmt = SSA_NAME_DEF_STMT (oe->op);
>> +  bb = gimple_bb (def_stmt);
>> +  if (!pointer_set_contains (seen_bbs, bb))
>> +{
>> +  assign_uids (bb);
>> +  pointer_set_insert (seen_bbs, bb);
>> +}
>> +}
>> +  pointer_set_destroy (seen_bbs);
>> +}
>> +
>> +/* Ensure that operands in the OPS vector starting from OPINDEXth
>> entry are live
>> +   at STMT.  This is accomplished by moving STMT if needed.  */
>> +
>> +static void
>> +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
>> ops, int opindex)
>
> format issue?
>
>> +{
>> +  int i;
>> +  int len = VEC_length (operand_entry_t, ops);
>> +  gimple insert_stmt = stmt;
>> +  basic_block insert_bb = gimple_bb (stmt);
>> +  gimple_stmt_iterator gsi_insert, gsistmt;
>> +  for (i = opindex; i < len; i++)
>> +{
>> +  operand_entry_t oe = VEC_index (operand_entry_t, ops, i);
>> +  gimple def_stmt;
>> +  basic_block def_bb;
>> +  /* Ignore constants and operands with default definitons.  */
>> +  if (TREE_CODE (oe->op) != SSA_NAME
>> +  || SSA_NAME_IS_DEFAULT_DEF (oe->op))
>> +continue;
>> +  def_stmt = SSA_NAME_DEF_STMT (oe->op);
>> +  def_bb = gimple_bb (def_stmt);
>> +  if (def_bb != insert_bb
>> +  && !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb))
>> +{
>> +  insert_bb = def_bb;
>> +  insert_stmt = def_stmt;
>> +}
>> +  else if (def_bb == insert_bb
>> +   && gimple_uid (insert_stmt) < gimple_uid (def_stmt))
>> +insert_stmt = def_stmt;
>> +}
>> +  if (insert_stmt == stmt)
>> +return;
>> +  gsistmt = gsi_for_stmt (stmt);
>> +  /* If GSI_STMT is a phi node, then do not insert just after that 
>> statement.
>
> GSI_STMT --> INSERT_STMT?

Ok.
>> + Instead, find the first non-label gimple statement in BB and insert 
>> before
>> + that. 

Re: Minimize downward code motion during reassociation

2012-10-22 Thread Easwaran Raman
On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
 wrote:
> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman  wrote:
>> Hi,
>>
>> During expression reassociation, statements are conservatively moved
>> downwards to ensure that dependences are correctly satisfied after
>> reassocation. This could lead to lengthening of live ranges. This
>> patch moves statements only to the extent necessary. Bootstraps and no
>> test regression on x86_64/linux. OK for trunk?
>>
>> Thanks,
>> Easwaran
>>
>> 2012-10-18   Easwaran Raman  
>> * tree-ssa-reassoc.c(assign_uids): New function.
>> (assign_uids_in_relevant_bbs): Likewise.
>> (ensure_ops_are_available): Likewise.
>> (rewrite_expr_tree): Do not move statements beyond what is
>> necessary. Remove call to swap_ops_for_binary_stmt...
>> (reassociate_bb): ... and move it here.
>>
>>
>>
>> Index: gcc/tree-ssa-reassoc.c
>> ===
>> --- gcc/tree-ssa-reassoc.c (revision 192487)
>> +++ gcc/tree-ssa-reassoc.c (working copy)
>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
>>  }
>>  }
>>
>> +/* Assign UIDs to statements in basic block BB.  */
>> +
>> +static void
>> +assign_uids (basic_block bb)
>> +{
>> +  unsigned uid = 0;
>> +  gimple_stmt_iterator gsi;
>> +  /* First assign uids to phis.  */
>> +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +{
>> +  gimple stmt = gsi_stmt (gsi);
>> +  gimple_set_uid (stmt, uid++);
>> +}
>> +
>> +  /* Then assign uids to stmts.  */
>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +{
>> +  gimple stmt = gsi_stmt (gsi);
>> +  gimple_set_uid (stmt, uid++);
>> +}
>> +}
>> +
>> +/* For each operand in OPS, find the basic block that contains the statement
>> +   which defines the operand. For all such basic blocks, assign UIDs.  */
>> +
>> +static void
>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
>> +{
>> +  operand_entry_t oe;
>> +  int i;
>> +  struct pointer_set_t *seen_bbs = pointer_set_create ();
>> +
>> +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
>> +{
>> +  gimple def_stmt;
>> +  basic_block bb;
>> +  if (TREE_CODE (oe->op) != SSA_NAME)
>> +continue;
>> +  def_stmt = SSA_NAME_DEF_STMT (oe->op);
>> +  bb = gimple_bb (def_stmt);
>> +  if (!pointer_set_contains (seen_bbs, bb))
>> +{
>> +  assign_uids (bb);
>> +  pointer_set_insert (seen_bbs, bb);
>> +}
>> +}
>> +  pointer_set_destroy (seen_bbs);
>> +}
>
> Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
> You seem to call the above multiple times and thus do work bigger than
> O(number of basic blocks).

The reason I call the above multiple times is that gsi_move_before
might get called between two calls to the above. For instance, after
rewrite_expr_tree is called once, the following sequence of calls
could happen:
reassociate_bb -> linearize_expr_tree -> linearize_expr ->
gsi_move_before.  So it is not sufficient to call
renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to
use renumber_gimple_stmt_uids_in_blocks instead of
assign_uids_in_relevant_bbs?



>> +/* Ensure that operands in the OPS vector starting from OPINDEXth
>> entry are live
>> +   at STMT.  This is accomplished by moving STMT if needed.  */
>> +
>> +static void
>> +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
>> ops, int opindex)
>> +{
>> +  int i;
>> +  int len = VEC_length (operand_entry_t, ops);
>> +  gimple insert_stmt = stmt;
>> +  basic_block insert_bb = gimple_bb (stmt);
>> +  gimple_stmt_iterator gsi_insert, gsistmt;
>> +  for (i = opindex; i < len; i++)
>> +{
>
> Likewise you call this for each call to rewrite_expr_tree, so it seems to me
> this is quadratic in the number of ops in the op vector.

The call to ensure_ops_are_available inside rewrite_expr_tree is
guarded by if (!moved) and I am setting moved = true there to ensure
that ensure_ops_are_available inside is called once per reassociation
of a expression tree.

>
> Why make this all so complicated?  It seems to me that we should
> fixup stmt order only after the whole ops vector has been materialized.


>
>> +  operand_entry_t oe = VEC_index (operand_entry_t, ops, i);
>> +  gim

Re: Fix bugs introduced by switch-case profile propagation

2012-10-22 Thread Easwaran Raman
Ping.


On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman  wrote:
> Hi,
>  This patch fixes bugs introduced by my previous patch to propagate
> profiles during switch expansion. Bootstrap and profiledbootstrap
> successful on x86_64. Confirmed that it fixes the crashes reported in
> PR middle-end/54957. OK for trunk?
>
> - Easwaran
>
> 2012-10-17   Easwaran Raman  
>
> PR target/54938
> PR middle-end/54957
> * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
> only if it doesn't already exist.
> * except.c (sjlj_emit_function_enter): Remove unused variable.
> * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
> (emit_case_dispatch_table): Handle the case where STMT_BB is
> NULL.
> (expand_sjlj_dispatch_table): Pass BB containing before_case
> to emit_case_dispatch_table.
>
> Index: gcc/optabs.c
> ===
> --- gcc/optabs.c (revision 192488)
> +++ gcc/optabs.c (working copy)
> @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m
>&& profile_status != PROFILE_ABSENT
>&& insn
>&& JUMP_P (insn)
> -  && any_condjump_p (insn))
> -{
> -  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
> -  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
> -}
> +  && any_condjump_p (insn)
> +  && !find_reg_note (insn, REG_BR_PROB, 0))
> +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
>  }
>
>  /* Generate code to compare X with Y so that the condition codes are
> Index: gcc/except.c
> ===
> --- gcc/except.c (revision 192488)
> +++ gcc/except.c (working copy)
> @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label)
>if (dispatch_label)
>  {
>  #ifdef DONT_USE_BUILTIN_SETJMP
> -  rtx x, last;
> +  rtx x;
>x = emit_library_call_value (setjmp_libfunc, NULL_RTX, 
> LCT_RETURNS_TWICE,
> TYPE_MODE (integer_type_node), 1,
> plus_constant (Pmode, XEXP (fc, 0),
> Index: gcc/stmt.c
> ===
> --- gcc/stmt.c (revision 192488)
> +++ gcc/stmt.c (working copy)
> @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb)
>edge e;
>edge_iterator ei;
>int prob_sum = 0;
> +  if (!bb)
> +return 0;
>FOR_EACH_EDGE(e, ei, bb->succs)
>  prob_sum += e->probability;
>return prob_sum;
> @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in
>rtx fallback_label = label_rtx (case_list->code_label);
>rtx table_label = gen_label_rtx ();
>bool has_gaps = false;
> -  edge default_edge = EDGE_SUCC(stmt_bb, 0);
> -  int default_prob = default_edge->probability;
> +  edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL;
> +  int default_prob = default_edge ? default_edge->probability : 0;
>int base = get_outgoing_edge_probs (stmt_bb);
>bool try_with_tablejump = false;
>
> @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in
>default_prob = 0;
>  }
>
> -  default_edge->probability = default_prob;
> +  if (default_edge)
> +default_edge->probability = default_prob;
>
>/* We have altered the probability of the default edge. So the 
> probabilities
>   of all other edges need to be adjusted so that it sums up to
> @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index,
>
>emit_case_dispatch_table (index_expr, index_type,
>   case_list, default_label,
> - minval, maxval, range, NULL);
> + minval, maxval, range,
> +BLOCK_FOR_INSN (before_case));
>emit_label (default_label);
>free_alloc_pool (case_node_pool);
>  }


Re: Fix bugs introduced by switch-case profile propagation

2012-10-23 Thread Easwaran Raman
On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka  wrote:
>> Ping.
>>
>>
>> On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman  wrote:
>> > Hi,
>> >  This patch fixes bugs introduced by my previous patch to propagate
>> > profiles during switch expansion. Bootstrap and profiledbootstrap
>> > successful on x86_64. Confirmed that it fixes the crashes reported in
>> > PR middle-end/54957. OK for trunk?
>> >
>> > - Easwaran
>> >
>> > 2012-10-17   Easwaran Raman  
>> >
>> > PR target/54938
>> > PR middle-end/54957
>> > * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
>> > only if it doesn't already exist.
>> > * except.c (sjlj_emit_function_enter): Remove unused variable.
>> > * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
>
> Seems fine, but under what conditions you get NULL here?

When expand_sjlj_dispatch_table calls emit_case_dispatch_table,
stmt_bb is NULL.

- Easwaran

> Honza
>> > (emit_case_dispatch_table): Handle the case where STMT_BB is
>> > NULL.
>> > (expand_sjlj_dispatch_table): Pass BB containing before_case
>> > to emit_case_dispatch_table.
>> >
>> > Index: gcc/optabs.c
>> > ===
>> > --- gcc/optabs.c (revision 192488)
>> > +++ gcc/optabs.c (working copy)
>> > @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m
>> >&& profile_status != PROFILE_ABSENT
>> >&& insn
>> >&& JUMP_P (insn)
>> > -  && any_condjump_p (insn))
>> > -{
>> > -  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
>> > -  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
>> > -}
>> > +  && any_condjump_p (insn)
>> > +  && !find_reg_note (insn, REG_BR_PROB, 0))
>> > +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
>> >  }
>> >
>> >  /* Generate code to compare X with Y so that the condition codes are
>> > Index: gcc/except.c
>> > ===
>> > --- gcc/except.c (revision 192488)
>> > +++ gcc/except.c (working copy)
>> > @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label)
>> >if (dispatch_label)
>> >  {
>> >  #ifdef DONT_USE_BUILTIN_SETJMP
>> > -  rtx x, last;
>> > +  rtx x;
>> >x = emit_library_call_value (setjmp_libfunc, NULL_RTX, 
>> > LCT_RETURNS_TWICE,
>> > TYPE_MODE (integer_type_node), 1,
>> > plus_constant (Pmode, XEXP (fc, 0),
>> > Index: gcc/stmt.c
>> > ===
>> > --- gcc/stmt.c (revision 192488)
>> > +++ gcc/stmt.c (working copy)
>> > @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb)
>> >edge e;
>> >edge_iterator ei;
>> >int prob_sum = 0;
>> > +  if (!bb)
>> > +return 0;
>> >FOR_EACH_EDGE(e, ei, bb->succs)
>> >  prob_sum += e->probability;
>> >return prob_sum;
>> > @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in
>> >rtx fallback_label = label_rtx (case_list->code_label);
>> >rtx table_label = gen_label_rtx ();
>> >bool has_gaps = false;
>> > -  edge default_edge = EDGE_SUCC(stmt_bb, 0);
>> > -  int default_prob = default_edge->probability;
>> > +  edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL;
>> > +  int default_prob = default_edge ? default_edge->probability : 0;
>> >int base = get_outgoing_edge_probs (stmt_bb);
>> >bool try_with_tablejump = false;
>> >
>> > @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in
>> >default_prob = 0;
>> >  }
>> >
>> > -  default_edge->probability = default_prob;
>> > +  if (default_edge)
>> > +default_edge->probability = default_prob;
>> >
>> >/* We have altered the probability of the default edge. So the 
>> > probabilities
>> >   of all other edges need to be adjusted so that it sums up to
>> > @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index,
>> >
>> >emit_case_dispatch_table (index_expr, index_type,
>> >   case_list, default_label,
>> > - minval, maxval, range, NULL);
>> > + minval, maxval, range,
>> > +BLOCK_FOR_INSN (before_case));
>> >emit_label (default_label);
>> >free_alloc_pool (case_node_pool);
>> >  }


Re: Minimize downward code motion during reassociation

2012-10-23 Thread Easwaran Raman
On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener
 wrote:
> On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman  wrote:
>> On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
>>  wrote:
>>> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman  wrote:
>>>> Hi,
>>>>
>>>> During expression reassociation, statements are conservatively moved
>>>> downwards to ensure that dependences are correctly satisfied after
>>>> reassocation. This could lead to lengthening of live ranges. This
>>>> patch moves statements only to the extent necessary. Bootstraps and no
>>>> test regression on x86_64/linux. OK for trunk?
>>>>
>>>> Thanks,
>>>> Easwaran
>>>>
>>>> 2012-10-18   Easwaran Raman  
>>>> * tree-ssa-reassoc.c(assign_uids): New function.
>>>> (assign_uids_in_relevant_bbs): Likewise.
>>>> (ensure_ops_are_available): Likewise.
>>>> (rewrite_expr_tree): Do not move statements beyond what is
>>>> necessary. Remove call to swap_ops_for_binary_stmt...
>>>> (reassociate_bb): ... and move it here.
>>>>
>>>>
>>>>
>>>> Index: gcc/tree-ssa-reassoc.c
>>>> ===
>>>> --- gcc/tree-ssa-reassoc.c (revision 192487)
>>>> +++ gcc/tree-ssa-reassoc.c (working copy)
>>>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
>>>>  }
>>>>  }
>>>>
>>>> +/* Assign UIDs to statements in basic block BB.  */
>>>> +
>>>> +static void
>>>> +assign_uids (basic_block bb)
>>>> +{
>>>> +  unsigned uid = 0;
>>>> +  gimple_stmt_iterator gsi;
>>>> +  /* First assign uids to phis.  */
>>>> +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>> +{
>>>> +  gimple stmt = gsi_stmt (gsi);
>>>> +  gimple_set_uid (stmt, uid++);
>>>> +}
>>>> +
>>>> +  /* Then assign uids to stmts.  */
>>>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>> +{
>>>> +  gimple stmt = gsi_stmt (gsi);
>>>> +  gimple_set_uid (stmt, uid++);
>>>> +}
>>>> +}
>>>> +
>>>> +/* For each operand in OPS, find the basic block that contains the 
>>>> statement
>>>> +   which defines the operand. For all such basic blocks, assign UIDs.  */
>>>> +
>>>> +static void
>>>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
>>>> +{
>>>> +  operand_entry_t oe;
>>>> +  int i;
>>>> +  struct pointer_set_t *seen_bbs = pointer_set_create ();
>>>> +
>>>> +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
>>>> +{
>>>> +  gimple def_stmt;
>>>> +  basic_block bb;
>>>> +  if (TREE_CODE (oe->op) != SSA_NAME)
>>>> +continue;
>>>> +  def_stmt = SSA_NAME_DEF_STMT (oe->op);
>>>> +  bb = gimple_bb (def_stmt);
>>>> +  if (!pointer_set_contains (seen_bbs, bb))
>>>> +{
>>>> +  assign_uids (bb);
>>>> +  pointer_set_insert (seen_bbs, bb);
>>>> +}
>>>> +}
>>>> +  pointer_set_destroy (seen_bbs);
>>>> +}
>>>
>>> Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
>>> You seem to call the above multiple times and thus do work bigger than
>>> O(number of basic blocks).
>>
>> The reason I call the above multiple times is that gsi_move_before
>> might get called between two calls to the above. For instance, after
>> rewrite_expr_tree is called once, the following sequence of calls
>> could happen:
>> reassociate_bb -> linearize_expr_tree -> linearize_expr ->
>> gsi_move_before.  So it is not sufficient to call
>> renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to
>> use renumber_gimple_stmt_uids_in_blocks instead of
>> assign_uids_in_relevant_bbs?
>
> It's sufficient to call it once if you conservatively update UIDs at stmt
> move / insert time (assign the same UID as the stmt before/after).

I was worried that that would make the code brittle, but looking at
the places where a new statement is added or moved, I think it is
fine.  I have made the change in the new patch.

>

Re: Fix bugs introduced by switch-case profile propagation

2012-10-25 Thread Easwaran Raman
Hi,

On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka  wrote:
>> Ping.
>>
>>
>> On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman  wrote:
>> > Hi,
>> >  This patch fixes bugs introduced by my previous patch to propagate
>> > profiles during switch expansion. Bootstrap and profiledbootstrap
>> > successful on x86_64. Confirmed that it fixes the crashes reported in
>> > PR middle-end/54957. OK for trunk?
>> >
>> > - Easwaran
>> >
>> > 2012-10-17   Easwaran Raman  
>> >
>> > PR target/54938
>> > PR middle-end/54957
>> > * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
>> > only if it doesn't already exist.
>> > * except.c (sjlj_emit_function_enter): Remove unused variable.
>> > * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
>
> Seems fine, but under what conditions you get NULL here?
Wasn't sure if this is an OK for the patch or if I need to address
anything else.

Thanks,
Easwaran

>
> Honza
>> > (emit_case_dispatch_table): Handle the case where STMT_BB is
>> > NULL.
>> > (expand_sjlj_dispatch_table): Pass BB containing before_case
>> > to emit_case_dispatch_table.
>> >
>> > Index: gcc/optabs.c
>> > ===
>> > --- gcc/optabs.c (revision 192488)
>> > +++ gcc/optabs.c (working copy)
>> > @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m
>> >&& profile_status != PROFILE_ABSENT
>> >&& insn
>> >&& JUMP_P (insn)
>> > -  && any_condjump_p (insn))
>> > -{
>> > -  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
>> > -  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
>> > -}
>> > +  && any_condjump_p (insn)
>> > +  && !find_reg_note (insn, REG_BR_PROB, 0))
>> > +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
>> >  }
>> >
>> >  /* Generate code to compare X with Y so that the condition codes are
>> > Index: gcc/except.c
>> > ===
>> > --- gcc/except.c (revision 192488)
>> > +++ gcc/except.c (working copy)
>> > @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label)
>> >if (dispatch_label)
>> >  {
>> >  #ifdef DONT_USE_BUILTIN_SETJMP
>> > -  rtx x, last;
>> > +  rtx x;
>> >x = emit_library_call_value (setjmp_libfunc, NULL_RTX, 
>> > LCT_RETURNS_TWICE,
>> > TYPE_MODE (integer_type_node), 1,
>> > plus_constant (Pmode, XEXP (fc, 0),
>> > Index: gcc/stmt.c
>> > ===
>> > --- gcc/stmt.c (revision 192488)
>> > +++ gcc/stmt.c (working copy)
>> > @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb)
>> >edge e;
>> >edge_iterator ei;
>> >int prob_sum = 0;
>> > +  if (!bb)
>> > +return 0;
>> >FOR_EACH_EDGE(e, ei, bb->succs)
>> >  prob_sum += e->probability;
>> >return prob_sum;
>> > @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in
>> >rtx fallback_label = label_rtx (case_list->code_label);
>> >rtx table_label = gen_label_rtx ();
>> >bool has_gaps = false;
>> > -  edge default_edge = EDGE_SUCC(stmt_bb, 0);
>> > -  int default_prob = default_edge->probability;
>> > +  edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL;
>> > +  int default_prob = default_edge ? default_edge->probability : 0;
>> >int base = get_outgoing_edge_probs (stmt_bb);
>> >bool try_with_tablejump = false;
>> >
>> > @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in
>> >default_prob = 0;
>> >  }
>> >
>> > -  default_edge->probability = default_prob;
>> > +  if (default_edge)
>> > +default_edge->probability = default_prob;
>> >
>> >/* We have altered the probability of the default edge. So the 
>> > probabilities
>> >   of all other edges need to be adjusted so that it sums up to
>> > @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index,
>> >
>> >emit_case_dispatch_table (index_expr, index_type,
>> >   case_list, default_label,
>> > - minval, maxval, range, NULL);
>> > + minval, maxval, range,
>> > +BLOCK_FOR_INSN (before_case));
>> >emit_label (default_label);
>> >free_alloc_pool (case_node_pool);
>> >  }


Re: Fix bugs introduced by switch-case profile propagation

2012-10-31 Thread Easwaran Raman
On Fri, Oct 26, 2012 at 8:05 AM, Jan Hubicka  wrote:
>> Hi,
>>
>> On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka  wrote:
>> >> Ping.
>> >>
>> >>
>> >> On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman  wrote:
>> >> > Hi,
>> >> >  This patch fixes bugs introduced by my previous patch to propagate
>> >> > profiles during switch expansion. Bootstrap and profiledbootstrap
>> >> > successful on x86_64. Confirmed that it fixes the crashes reported in
>> >> > PR middle-end/54957. OK for trunk?
>> >> >
>> >> > - Easwaran
>> >> >
>> >> > 2012-10-17   Easwaran Raman  
>> >> >
>> >> > PR target/54938
>> >> > PR middle-end/54957
>> >> > * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
>> >> > only if it doesn't already exist.
>> >> > * except.c (sjlj_emit_function_enter): Remove unused variable.
>> >> > * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
>> >
>> > Seems fine, but under what conditions you get NULL here?
>> Wasn't sure if this is an OK for the patch or if I need to address
>> anything else.
>
> Actually I think you should make the except.c to setcurrent_bb when expanding
> the switch instead.

In the current code, in sjlj_emit_dispatch_table (except.c), a
sequence of instructions including the dispatch table jump are first
generated and emitted to a BB before the first of the jump table
targets. I think you are asking me to first emit the instructions
before the indirect jump into a new bb so that BLOCK_FOR_INSN
(before_case) in stmt.c is non NULL. This would need some refactoring
inside except.c, but more importantly this BB won't have the correct
control flow right? So, while I can avoid the check for BB being NULL
in get_outgoing_edge_probs, I still need to guard for default_edge
being NULL and default_prob will still be 0. Would it be ok if I
remove the check for NULL inside get_outgoing_edge_probs and instead
check it when I initialize BASE?

Thanks,
Easwaran

> OK with this change.
>
> Honza


Don't drop attributes on template member functions

2013-10-02 Thread Easwaran Raman
This attempts to address one issue reported in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33911 . Attributes at the
end of a declaration of template member functions are discarded in
cp_parser_init_declarator.  The suggested workaround is to place the
attributes into the declaration-specifiers. The code in
cp_parser_init_declarator actually parses the attributes at the end,
but only passes the prefix_attributes (from decl-specifier-seq) to
grokfield(). Appending the parsed atrtributes to prefix_attributes
seems to work, but I don't know if this is sufficient in all cases.
Bootstraps and all tests pass with the patch on x86_64/linux. Is this
ok for trunk?

2013-10-02  Easwaran Raman  

PR c++/33911
* parser.c (cp_parser_init_declarator): Do not drop attributes
of template member functions.

2013-10-02  Easwaran Raman  

PR c++/33911
* g++.dg/ext/attribute47.C: New.


parser_attribute.patch
Description: Binary data


Add a param to decide stack slot sharing at -O0

2013-10-08 Thread Easwaran Raman
In cfgexpand.c, variables in non-overlapping lexical scopes are
assigned same stack locations at -O1 and above. At -O0, this is
attempted only if the size of the stack objects is above a threshold
(32). The rationale is at -O0, more variables are going to be in the
stack and the O(n^2) stack slot sharing algorithm will increase the
compilation time. This patch replaces the constant with a param which
is set to 32 by default. We ran into a case where the presence of
always_inline attribute triggered Wframe-larger-than warnings at -O0
but not at -O2 since the different inlined copies share the stack. We
are ok with a slight increase in compilation time to get smaller stack
frames even at -O0 and this patch would allow us do that easily.

Bootstraps on x86_64/linux. Is this ok for trunk?

Thanks,
Easwaran


2013-10-08  Easwaran Raman 

* params.def (PARAM_MIN_SIZE_FOR_STACK_SHARING): New param...
* cfgexpand.c (defer_stack_allocation): ...use here


cfgexpand.patch
Description: Binary data


Re: Add a param to decide stack slot sharing at -O0

2013-10-09 Thread Easwaran Raman
On Wed, Oct 9, 2013 at 4:11 AM, Richard Biener
 wrote:
> On Tue, Oct 8, 2013 at 11:04 PM, Easwaran Raman  wrote:
>> In cfgexpand.c, variables in non-overlapping lexical scopes are
>> assigned same stack locations at -O1 and above. At -O0, this is
>> attempted only if the size of the stack objects is above a threshold
>> (32). The rationale is at -O0, more variables are going to be in the
>> stack and the O(n^2) stack slot sharing algorithm will increase the
>> compilation time. This patch replaces the constant with a param which
>> is set to 32 by default. We ran into a case where the presence of
>> always_inline attribute triggered Wframe-larger-than warnings at -O0
>> but not at -O2 since the different inlined copies share the stack. We
>> are ok with a slight increase in compilation time to get smaller stack
>> frames even at -O0 and this patch would allow us do that easily.
>>
>> Bootstraps on x86_64/linux. Is this ok for trunk?
>
> Ok with
>
> +DEFPARAM (PARAM_MIN_SIZE_FOR_STACK_SHARING,
> + "min-size-for-stack-sharing",
> + "Attempt to share stack slots among variables in different
> lexical blocks "
> + "at O0 only if their sizes exceed this value",
> + 32, 0, 0)
>
> changed to
>
>"The minimum size of variables taking part in stack slot sharing "
>"when not optimizing"
>32, 0, 0)
>
> And with adding documentation for that param in doc/invoke.texi.
>
> Btw, I'm not sure the sharing algorithm is still quadratic - can you
> investigate on that?

The partition_stack_vars is still quadratic in worst case (all live
ranges overlap), but the expensive part is likely to be the
add_scope_conflicts function.

- Easwaran

> Thanks,
> Richard.
>
>> Thanks,
>> Easwaran
>>
>>
>> 2013-10-08  Easwaran Raman 
>>
>> * params.def (PARAM_MIN_SIZE_FOR_STACK_SHARING): New param...
>> * cfgexpand.c (defer_stack_allocation): ...use here


[4.9] PR 62146

2014-08-25 Thread Easwaran Raman
This patch deletes REG_EQUAL note when a src register is replaced by a
constant in an assignment. This is to prevent spurious equivalences
between the constant and the expression in the REG_EQUAL note. In the
bug reported in PR 62146, an assignment in one branch (which is
actually dead) of an IF statement has a REG_EQUAL note equating a
register with an expression. Conditional copy propagation replaces the
register with 0. The instruction is hoisted above the branch
subsequently and then the value 0 is equated with the expression in
the REG_EQUAL. Is this ok for 4.9 branch if all tests pass?

This patch looks applicable to trunk as well, but I don't have a test
case to reproduce the issue in trunk.


ChangeLog:

2014-08-25  Easwaran Raman  

PR rtl-optimization/62146
* cprop.c (try_replace_reg): Remove REG_EQUAL note when a constant is
propagated into the src of an assignment.

testsuite/ChangeLog

2014-08-25  Easwaran Raman  

PR rtl-optimization/62146
* testsuite/g++.dg/opt/pr62146.C: New.
Index: gcc/testsuite/g++.dg/opt/pr62146.C
===
--- gcc/testsuite/g++.dg/opt/pr62146.C	(revision 0)
+++ gcc/testsuite/g++.dg/opt/pr62146.C	(revision 0)
@@ -0,0 +1,51 @@
+/* PR rtl-optimization/62146 */
+/* { dg-do compile } */
+/* { dg-options "-O2 " } */
+class F
+{
+public:
+virtual ~ F ();
+};
+template < class CL > class G:public F
+{
+int *member_;
+public:
+G ( int *b): member_ (0)
+{
+}
+};
+
+class D
+{
+public:
+template < class CL > void RegisterNonTagCallback (int,
+void (CL::
+  *p3) ())
+{
+InternalRegisterNonTag (p3 ? new G < CL > ( 0) : 0);
+} void InternalRegisterNonTag (F *);
+};
+
+void fn1 ();
+class C1
+{
+void  foo();
+class TokenType
+{
+public:
+void AddToken ()
+{
+}
+};
+C1::TokenType bar_t;
+};
+D a;
+void C1::foo()
+{
+if (&bar_t)
+fn1 ();
+for (int i = 0; i < sizeof 0; ++i)
+a.RegisterNonTagCallback (0, &TokenType::AddToken);
+}
+
+/* { dg-final { scan-assembler-not "mov.*_ZN2C19TokenType8AddTokenEv, .\\\(" } } */
Index: gcc/cprop.c
===
--- gcc/cprop.c	(revision 214472)
+++ gcc/cprop.c	(working copy)
@@ -790,8 +790,11 @@ try_replace_reg (rtx from, rtx to, rtx insn)
   /* REG_EQUAL may get simplified into register.
  We don't allow that. Remove that note. This code ought
  not to happen, because previous code ought to synthesize
- reg-reg move, but be on the safe side.  */
-  if (note && REG_NOTE_KIND (note) == REG_EQUAL && REG_P (XEXP (note, 0)))
+ reg-reg move, but be on the safe side. The REG_EQUAL note is
+ also removed when the source is a constant.  */
+  if (note && REG_NOTE_KIND (note) == REG_EQUAL
+  && (REG_P (XEXP (note, 0))
+  || (set && CONSTANT_P (SET_SRC (set)
 remove_note (insn, note);
 
   return success;


Re: [4.9] PR 62146

2014-08-27 Thread Easwaran Raman
On Mon, Aug 25, 2014 at 3:42 PM, Easwaran Raman  wrote:
> This patch deletes REG_EQUAL note when a src register is replaced by a
> constant in an assignment. This is to prevent spurious equivalences
> between the constant and the expression in the REG_EQUAL note. In the
> bug reported in PR 62146, an assignment in one branch (which is
> actually dead) of an IF statement has a REG_EQUAL note equating a
> register with an expression. Conditional copy propagation replaces the
> register with 0. The instruction is hoisted above the branch
> subsequently and then the value 0 is equated with the expression in
> the REG_EQUAL. Is this ok for 4.9 branch if all tests pass?
>
> This patch looks applicable to trunk as well, but I don't have a test
> case to reproduce the issue in trunk.
>
>
> ChangeLog:
>
> 2014-08-25  Easwaran Raman  
>
> PR rtl-optimization/62146
> * cprop.c (try_replace_reg): Remove REG_EQUAL note when a constant is
> propagated into the src of an assignment.
>
> testsuite/ChangeLog
>
> 2014-08-25  Easwaran Raman  
>
> PR rtl-optimization/62146
> * testsuite/g++.dg/opt/pr62146.C: New.


Re: [4.9] PR 62146

2014-09-02 Thread Easwaran Raman
It turns out that the REG_EQUAL note is removed on a hoisted
instruction (relevant code is in dead_or_predicable in ifcvt.c) if the
source of the move instruction is not a function invariant. In this
case, the source is a function invariant (constant) and so that
doesn't kick in. I don't understand why this exemption for function
invariant is there and the original thread in
https://gcc.gnu.org/ml/gcc/2005-05/msg01710.html doesn't explain
either. Should I just remove the REG_EQUAL notes of all hoisted
instructions or are there cases where it is safe to leave the note?

Thanks,
Easwaran



On Fri, Aug 29, 2014 at 1:06 PM, Jeff Law  wrote:
> On 08/25/14 16:42, Easwaran Raman wrote:
>>
>> This patch deletes REG_EQUAL note when a src register is replaced by a
>> constant in an assignment. This is to prevent spurious equivalences
>> between the constant and the expression in the REG_EQUAL note. In the
>> bug reported in PR 62146, an assignment in one branch (which is
>> actually dead) of an IF statement has a REG_EQUAL note equating a
>> register with an expression. Conditional copy propagation replaces the
>> register with 0. The instruction is hoisted above the branch
>> subsequently and then the value 0 is equated with the expression in
>> the REG_EQUAL. Is this ok for 4.9 branch if all tests pass?
>>
>> This patch looks applicable to trunk as well, but I don't have a test
>> case to reproduce the issue in trunk.
>
> Something doesn't feel right with this patch.  It seems to me the real
> problem is when when hoist the insn with the note.  If the equivalence
> implied by the note is no longer valid at the insn's new location, then the
> note needs to be removed.
>
> Now determining if the note is no longer valid at the new location may prove
> difficult ;-)  You'd probably have to know why the note was created, how it
> was changed, etc.  So I suspect the right thing to do is just remove
> REG_EQUAL notes on any insns we hoist in this manner.
>
> Jeff


Re: [4.9] PR 62146

2014-09-04 Thread Easwaran Raman
I've attached the revised patch. Bootstrapped and no test regressions
on x86_64/linux with 4.9 branch. Ok for 4.9 branch? While the bug
doesn't show up in trunk, seems obvious that this should go to trunk
as well. Is it ok for trunk if tests pass?

Btw, is g++.dg/opt the right directory for the test to go?

-

ChangeLog

2014-09-04  Easwaran Raman  

PR rtl-optimization/62146
* ifcvt.c (dead_or_predicable): Make removal of REG_EQUAL note of
hoisted instruction unconditional.

2014-09-04  Easwaran Raman  

PR rtl-optimization/62146
* testsuite/g++.dg/opt/pr62146.C: New.


On Wed, Sep 3, 2014 at 11:36 AM, Jeff Law  wrote:
> On 09/02/14 12:52, Easwaran Raman wrote:
>>
>> It turns out that the REG_EQUAL note is removed on a hoisted
>> instruction (relevant code is in dead_or_predicable in ifcvt.c) if the
>> source of the move instruction is not a function invariant. In this
>> case, the source is a function invariant (constant) and so that
>> doesn't kick in. I don't understand why this exemption for function
>> invariant is there and the original thread in
>> https://gcc.gnu.org/ml/gcc/2005-05/msg01710.html doesn't explain
>> either. Should I just remove the REG_EQUAL notes of all hoisted
>> instructions or are there cases where it is safe to leave the note?
>
> I suspect it was left in in an attempt to keep as many REG_EQUAL notes as
> possible as the notes can help later optimizations.  But even those
> equivalences are not necessarily safe.
>
> I'm pretty sure the right thing to do here is just drop the note regardless
> of whether or not its an invariant.
>
> jeff
>
Index: testsuite/g++.dg/opt/pr62146.C
===
--- testsuite/g++.dg/opt/pr62146.C	(revision 0)
+++ testsuite/g++.dg/opt/pr62146.C	(revision 0)
@@ -0,0 +1,51 @@
+/* PR rtl-optimization/62146 */
+/* { dg-do compile } */
+/* { dg-options "-O2 " } */
+class F
+{
+public:
+virtual ~ F ();
+};
+template < class CL > class G:public F
+{
+int *member_;
+public:
+G ( int *b): member_ (0)
+{
+}
+};
+
+class D
+{
+public:
+template < class CL > void RegisterNonTagCallback (int,
+void (CL::
+  *p3) ())
+{
+InternalRegisterNonTag (p3 ? new G < CL > ( 0) : 0);
+} void InternalRegisterNonTag (F *);
+};
+
+void fn1 ();
+class C1
+{
+void  foo();
+class TokenType
+{
+public:
+void AddToken ()
+{
+}
+};
+C1::TokenType bar_t;
+};
+D a;
+void C1::foo()
+{
+if (&bar_t)
+fn1 ();
+for (int i = 0; i < sizeof 0; ++i)
+a.RegisterNonTagCallback (0, &TokenType::AddToken);
+}
+
+/* { dg-final { scan-assembler-not "mov.*_ZN2C19TokenType8AddTokenEv, .\\\(" } } */
Index: ifcvt.c
===
--- ifcvt.c	(revision 214472)
+++ ifcvt.c	(working copy)
@@ -4387,17 +4387,14 @@ dead_or_predicable (basic_block test_bb, basic_blo
   insn = head;
   do
 	{
-	  rtx note, set;
+	  rtx note;
 
 	  if (! INSN_P (insn))
 	continue;
 	  note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
 	  if (! note)
 	continue;
-	  set = single_set (insn);
-	  if (!set || !function_invariant_p (SET_SRC (set))
-	  || !function_invariant_p (XEXP (note, 0)))
-	remove_note (insn, note);
+	  remove_note (insn, note);
 	} while (insn != end && (insn = NEXT_INSN (insn)));
 
   /* PR46315: when moving insns above a conditional branch, the REG_EQUAL


Fix for PR c++/59031

2013-11-07 Thread Easwaran Raman
Before  r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL
is set and the call is direct. The changes at r193504 (to fix PR
c++/11750) caused a regression to this behavior. This patch attempts
to fix that. Bootstraps and no test regressions on x86_64/linux. Is
this a correct fix and is this ok for trunk?

Thanks,
Easwaran

2013-11-07  Easwaran Raman  

PR c++/59031
* call.c (build_new_method_call_1): Comnpare function context
with BASELINK_BINFO type rather than instance type before
marking the call with LOOKUP_NONVIRTUAL.

testsuite/ChangeLog:

2013-11-07  Easwaran Raman  

PR c++/59031
* g++.dg/inherit/virtual11.C: New test.
Index: testsuite/g++.dg/inherit/virtual11.C
===
--- testsuite/g++.dg/inherit/virtual11.C	(revision 0)
+++ testsuite/g++.dg/inherit/virtual11.C	(revision 0)
@@ -0,0 +1,17 @@
+// PR c++/59031 
+// { dg-do compile }
+// { dg-options "-fdump-tree-gimple " }
+class B {
+ public:
+  virtual int add (int a, int b) {return a+ b;}
+};
+
+class D : public B {
+};
+
+int foo (int a, int b) {
+  D d;
+  return d.add(a, b);
+}
+// { dg-final { scan-tree-dump-not "OBJ_TYPE_REF" "gimple" } }
+// { dg-final { cleanup-tree-dump "gimple" } }
Index: cp/call.c
===
--- cp/call.c	(revision 204466)
+++ cp/call.c	(working copy)
@@ -7511,7 +7511,7 @@ build_new_method_call_1 (tree instance, tree fns,
   struct z_candidate *candidates = 0, *cand;
   tree explicit_targs = NULL_TREE;
   tree basetype = NULL_TREE;
-  tree access_binfo;
+  tree access_binfo, binfo;
   tree optype;
   tree first_mem_arg = NULL_TREE;
   tree name;
@@ -7550,6 +7550,7 @@ build_new_method_call_1 (tree instance, tree fns,
   if (!conversion_path)
 conversion_path = BASELINK_BINFO (fns);
   access_binfo = BASELINK_ACCESS_BINFO (fns);
+  binfo = BASELINK_BINFO (fns);
   optype = BASELINK_OPTYPE (fns);
   fns = BASELINK_FUNCTIONS (fns);
   if (TREE_CODE (fns) == TEMPLATE_ID_EXPR)
@@ -7802,7 +7803,7 @@ build_new_method_call_1 (tree instance, tree fns,
 		 do not want to perform a non-virtual call.  */
 	  if (DECL_VINDEX (fn) && ! (flags & LOOKUP_NONVIRTUAL)
 		  && same_type_ignoring_top_level_qualifiers_p
-		  (DECL_CONTEXT (fn), TREE_TYPE (instance))
+		  (DECL_CONTEXT (fn), TREE_TYPE (binfo))
 		  && resolves_to_fixed_type_p (instance, 0))
 		flags |= LOOKUP_NONVIRTUAL;
   if (explicit_targs)


Re: Fix for PR c++/59031

2013-11-11 Thread Easwaran Raman
Ping.

On Thu, Nov 7, 2013 at 11:14 AM, Easwaran Raman  wrote:
> Before  r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL
> is set and the call is direct. The changes at r193504 (to fix PR
> c++/11750) caused a regression to this behavior. This patch attempts
> to fix that. Bootstraps and no test regressions on x86_64/linux. Is
> this a correct fix and is this ok for trunk?
>
> Thanks,
> Easwaran
>
> 2013-11-07  Easwaran Raman  
>
> PR c++/59031
> * call.c (build_new_method_call_1): Comnpare function context
> with BASELINK_BINFO type rather than instance type before
> marking the call with LOOKUP_NONVIRTUAL.
>
> testsuite/ChangeLog:
>
> 2013-11-07  Easwaran Raman  
>
> PR c++/59031
> * g++.dg/inherit/virtual11.C: New test.


Re: Fix for PR c++/59031

2013-11-15 Thread Easwaran Raman
Ping.


On Mon, Nov 11, 2013 at 9:46 AM, Easwaran Raman  wrote:
> Ping.
>
> On Thu, Nov 7, 2013 at 11:14 AM, Easwaran Raman  wrote:
>> Before  r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL
>> is set and the call is direct. The changes at r193504 (to fix PR
>> c++/11750) caused a regression to this behavior. This patch attempts
>> to fix that. Bootstraps and no test regressions on x86_64/linux. Is
>> this a correct fix and is this ok for trunk?
>>
>> Thanks,
>> Easwaran
>>
>> 2013-11-07  Easwaran Raman  
>>
>> PR c++/59031
>> * call.c (build_new_method_call_1): Comnpare function context
>> with BASELINK_BINFO type rather than instance type before
>> marking the call with LOOKUP_NONVIRTUAL.
>>
>> testsuite/ChangeLog:
>>
>> 2013-11-07  Easwaran Raman  
>>
>> PR c++/59031
>> * g++.dg/inherit/virtual11.C: New test.


Re: Minimize downward code motion during reassociation

2012-10-31 Thread Easwaran Raman
On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener
 wrote:
> On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman  wrote:
>> On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener
>>  wrote:
>>> On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman  wrote:
>>>> On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
>>>>  wrote:
>>>>> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman  
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> During expression reassociation, statements are conservatively moved
>>>>>> downwards to ensure that dependences are correctly satisfied after
>>>>>> reassocation. This could lead to lengthening of live ranges. This
>>>>>> patch moves statements only to the extent necessary. Bootstraps and no
>>>>>> test regression on x86_64/linux. OK for trunk?
>>>>>>
>>>>>> Thanks,
>>>>>> Easwaran
>>>>>>
>>>>>> 2012-10-18   Easwaran Raman  
>>>>>> * tree-ssa-reassoc.c(assign_uids): New function.
>>>>>> (assign_uids_in_relevant_bbs): Likewise.
>>>>>> (ensure_ops_are_available): Likewise.
>>>>>> (rewrite_expr_tree): Do not move statements beyond what is
>>>>>> necessary. Remove call to swap_ops_for_binary_stmt...
>>>>>> (reassociate_bb): ... and move it here.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Index: gcc/tree-ssa-reassoc.c
>>>>>> ===
>>>>>> --- gcc/tree-ssa-reassoc.c (revision 192487)
>>>>>> +++ gcc/tree-ssa-reassoc.c (working copy)
>>>>>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, 
>>>>>> hea
>>>>>>  }
>>>>>>  }
>>>>>>
>>>>>> +/* Assign UIDs to statements in basic block BB.  */
>>>>>> +
>>>>>> +static void
>>>>>> +assign_uids (basic_block bb)
>>>>>> +{
>>>>>> +  unsigned uid = 0;
>>>>>> +  gimple_stmt_iterator gsi;
>>>>>> +  /* First assign uids to phis.  */
>>>>>> +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>> +{
>>>>>> +  gimple stmt = gsi_stmt (gsi);
>>>>>> +  gimple_set_uid (stmt, uid++);
>>>>>> +}
>>>>>> +
>>>>>> +  /* Then assign uids to stmts.  */
>>>>>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>> +{
>>>>>> +  gimple stmt = gsi_stmt (gsi);
>>>>>> +  gimple_set_uid (stmt, uid++);
>>>>>> +}
>>>>>> +}
>>>>>> +
>>>>>> +/* For each operand in OPS, find the basic block that contains the 
>>>>>> statement
>>>>>> +   which defines the operand. For all such basic blocks, assign UIDs.  
>>>>>> */
>>>>>> +
>>>>>> +static void
>>>>>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
>>>>>> +{
>>>>>> +  operand_entry_t oe;
>>>>>> +  int i;
>>>>>> +  struct pointer_set_t *seen_bbs = pointer_set_create ();
>>>>>> +
>>>>>> +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
>>>>>> +{
>>>>>> +  gimple def_stmt;
>>>>>> +  basic_block bb;
>>>>>> +  if (TREE_CODE (oe->op) != SSA_NAME)
>>>>>> +continue;
>>>>>> +  def_stmt = SSA_NAME_DEF_STMT (oe->op);
>>>>>> +  bb = gimple_bb (def_stmt);
>>>>>> +  if (!pointer_set_contains (seen_bbs, bb))
>>>>>> +{
>>>>>> +  assign_uids (bb);
>>>>>> +  pointer_set_insert (seen_bbs, bb);
>>>>>> +}
>>>>>> +}
>>>>>> +  pointer_set_destroy (seen_bbs);
>>>>>> +}
>>>>>
>>>>> Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
>>>>> You seem to call the above multiple times and thus do work bigger than
>>>>> O(number of basic blocks).
>>>>
>>&

Re: Minimize downward code motion during reassociation

2012-11-02 Thread Easwaran Raman
Ping.

- Easwaran

On Wed, Oct 31, 2012 at 12:16 PM, Easwaran Raman  wrote:
> On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener
>  wrote:
>> On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman  wrote:
>>> On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener
>>>  wrote:
>>>> On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman  wrote:
>>>>> On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
>>>>>  wrote:
>>>>>> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman  
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> During expression reassociation, statements are conservatively moved
>>>>>>> downwards to ensure that dependences are correctly satisfied after
>>>>>>> reassocation. This could lead to lengthening of live ranges. This
>>>>>>> patch moves statements only to the extent necessary. Bootstraps and no
>>>>>>> test regression on x86_64/linux. OK for trunk?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Easwaran
>>>>>>>
>>>>>>> 2012-10-18   Easwaran Raman  
>>>>>>> * tree-ssa-reassoc.c(assign_uids): New function.
>>>>>>> (assign_uids_in_relevant_bbs): Likewise.
>>>>>>> (ensure_ops_are_available): Likewise.
>>>>>>> (rewrite_expr_tree): Do not move statements beyond what is
>>>>>>> necessary. Remove call to swap_ops_for_binary_stmt...
>>>>>>> (reassociate_bb): ... and move it here.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Index: gcc/tree-ssa-reassoc.c
>>>>>>> ===
>>>>>>> --- gcc/tree-ssa-reassoc.c (revision 192487)
>>>>>>> +++ gcc/tree-ssa-reassoc.c (working copy)
>>>>>>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, 
>>>>>>> hea
>>>>>>>  }
>>>>>>>  }
>>>>>>>
>>>>>>> +/* Assign UIDs to statements in basic block BB.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +assign_uids (basic_block bb)
>>>>>>> +{
>>>>>>> +  unsigned uid = 0;
>>>>>>> +  gimple_stmt_iterator gsi;
>>>>>>> +  /* First assign uids to phis.  */
>>>>>>> +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>>> +{
>>>>>>> +  gimple stmt = gsi_stmt (gsi);
>>>>>>> +  gimple_set_uid (stmt, uid++);
>>>>>>> +}
>>>>>>> +
>>>>>>> +  /* Then assign uids to stmts.  */
>>>>>>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>>> +{
>>>>>>> +  gimple stmt = gsi_stmt (gsi);
>>>>>>> +  gimple_set_uid (stmt, uid++);
>>>>>>> +}
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* For each operand in OPS, find the basic block that contains the 
>>>>>>> statement
>>>>>>> +   which defines the operand. For all such basic blocks, assign UIDs.  
>>>>>>> */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
>>>>>>> +{
>>>>>>> +  operand_entry_t oe;
>>>>>>> +  int i;
>>>>>>> +  struct pointer_set_t *seen_bbs = pointer_set_create ();
>>>>>>> +
>>>>>>> +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
>>>>>>> +{
>>>>>>> +  gimple def_stmt;
>>>>>>> +  basic_block bb;
>>>>>>> +  if (TREE_CODE (oe->op) != SSA_NAME)
>>>>>>> +continue;
>>>>>>> +  def_stmt = SSA_NAME_DEF_STMT (oe->op);
>>>>>>> +  bb = gimple_bb (def_stmt);
>>>>>>> +  if (!pointer_set_contains (seen_bbs, bb))
>>>>>>> +{
>>>>>>> +  assign_uids (bb);
>>>>>>> +  pointer_set_insert (seen_bbs, bb);
>>>>&

Re: Minimize downward code motion during reassociation

2012-11-05 Thread Easwaran Raman
I am unable to figure out the right way to handle the debug
statements. What I tried was to find debug statements that use the SSA
name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
and then moved them as well at the right place. Thus, if I have to
move t1 = a + b down (after the definition of 'd'), I also moved all
debug statements that use t1 after the new position of t1. That still
caused use-before-def problems in ssa_verify. I noticed that the debug
statements got modified behind the scenes causing these issues. Any
hints on what is the right way to handle the debug statements would be
very helpful.

Thanks,
Easwaran

On Sun, Nov 4, 2012 at 9:10 AM, Richard Biener
 wrote:
> On Wed, Oct 31, 2012 at 8:16 PM, Easwaran Raman  wrote:
>> On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener
>>  wrote:
>>> On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman  wrote:
>>>> On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener
>>>>  wrote:
>>>>> On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman  wrote:
>>>>>> On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
>>>>>>  wrote:
>>>>>>> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman  
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> During expression reassociation, statements are conservatively moved
>>>>>>>> downwards to ensure that dependences are correctly satisfied after
>>>>>>>> reassocation. This could lead to lengthening of live ranges. This
>>>>>>>> patch moves statements only to the extent necessary. Bootstraps and no
>>>>>>>> test regression on x86_64/linux. OK for trunk?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Easwaran
>>>>>>>>
>>>>>>>> 2012-10-18   Easwaran Raman  
>>>>>>>> * tree-ssa-reassoc.c(assign_uids): New function.
>>>>>>>> (assign_uids_in_relevant_bbs): Likewise.
>>>>>>>> (ensure_ops_are_available): Likewise.
>>>>>>>> (rewrite_expr_tree): Do not move statements beyond what is
>>>>>>>> necessary. Remove call to swap_ops_for_binary_stmt...
>>>>>>>> (reassociate_bb): ... and move it here.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Index: gcc/tree-ssa-reassoc.c
>>>>>>>> ===
>>>>>>>> --- gcc/tree-ssa-reassoc.c (revision 192487)
>>>>>>>> +++ gcc/tree-ssa-reassoc.c (working copy)
>>>>>>>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, 
>>>>>>>> hea
>>>>>>>>  }
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +/* Assign UIDs to statements in basic block BB.  */
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +assign_uids (basic_block bb)
>>>>>>>> +{
>>>>>>>> +  unsigned uid = 0;
>>>>>>>> +  gimple_stmt_iterator gsi;
>>>>>>>> +  /* First assign uids to phis.  */
>>>>>>>> +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>>>> +{
>>>>>>>> +  gimple stmt = gsi_stmt (gsi);
>>>>>>>> +  gimple_set_uid (stmt, uid++);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +  /* Then assign uids to stmts.  */
>>>>>>>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>>>> +{
>>>>>>>> +  gimple stmt = gsi_stmt (gsi);
>>>>>>>> +  gimple_set_uid (stmt, uid++);
>>>>>>>> +}
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* For each operand in OPS, find the basic block that contains the 
>>>>>>>> statement
>>>>>>>> +   which defines the operand. For all such basic blocks, assign UIDs. 
>>>>>>>>  */
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
>>>>>>>&

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-05 Thread Easwaran Raman
I'd like to get a small patch to tree reassociation (
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01761.html ) in.

Thanks,
Easwaran

On Mon, Oct 29, 2012 at 10:56 AM, Jakub Jelinek  wrote:
> Status
> ==
>
> I'd like to close the stage 1 phase of GCC 4.8 development
> on Monday, November 5th.  If you have still patches for new features you'd
> like to see in GCC 4.8, please post them for review soon.  Patches
> posted before the freeze, but reviewed shortly after the freeze, may
> still go in, further changes should be just bugfixes and documentation
> fixes.
>
>
> Quality Data
> 
>
> Priority  #   Change from Last Report
> ---   ---
> P1   23   + 23
> P2   77   +  8
> P3   85   + 84
> ---   ---
> Total   185   +115
>
>
> Previous Report
> ===
>
> http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html
>
> The next report will be sent by me again, announcing end of stage 1.


Re: Minimize downward code motion during reassociation

2012-12-07 Thread Easwaran Raman
It seems I need to reset the debug uses of a statement before moving
the statement itself. The attached patch starts from the leaf to root
of the tree to be reassociated and places them at the point where
their dependences will be met after reassociation. This bootstraps and
I am running the tests. Ok if there are no test failures?

Thanks,
Easwaran

2012-12-07   Easwaran Raman  
* tree-ssa-reassoc.c(find_insert_point): New function.
(insert_stmt_after): Likewise.
(get_def_stmt): Likewise.
(ensure_ops_are_available): Likewise.
(rewrite_expr_tree): Do not move statements beyond what is
necessary. Remove call to swap_ops_for_binary_stmt...
(reassociate_bb): ... and move it here.
(build_and_add_sum): Assign UIDs for new statements.
(linearize_expr): Likewise.
(do_reassoc): Renumber gimple statement UIDs.



On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener
 wrote:
> On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman  wrote:
>> I am unable to figure out the right way to handle the debug
>> statements. What I tried was to find debug statements that use the SSA
>> name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
>> and then moved them as well at the right place. Thus, if I have to
>> move t1 = a + b down (after the definition of 'd'), I also moved all
>> debug statements that use t1 after the new position of t1. That still
>> caused use-before-def problems in ssa_verify. I noticed that the debug
>> statements got modified behind the scenes causing these issues. Any
>> hints on what is the right way to handle the debug statements would be
>> very helpful.
>
> I think you cannot (and should not) move debug statements.  Instead you
> have to invalidate them.  Otherwise you'll introduce confusion as debug
> info cannot handle overlapping live ranges.
>
> But maybe Alex can clarify.
>
> Richard.


reassoc_revised.diff
Description: Binary data


Print column information in dump_loc

2013-05-14 Thread Easwaran Raman
This patch dumps the column number as part of dump_loc making the
output similar to inform(). This allows these messages to be pattern
matched by dg_message. Bootstraps with this change. Ok for trunk?

- Easwaran
-

2013-05-14  Easwaran Raman  

* dumpfile.c (dump_loc): Print column number.

Index: gcc/dumpfile.c
===
--- gcc/dumpfile.c (revision 198852)
+++ gcc/dumpfile.c (working copy)
@@ -261,12 +261,13 @@ dump_loc (int dump_kind, FILE *dfile, source_locat
   if (dump_kind)
 {
   if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
-fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
- LOCATION_LINE (loc));
+fprintf (dfile, "\n%s:%d:%d: note: ", LOCATION_FILE (loc),
+ LOCATION_LINE (loc), LOCATION_COLUMN (loc));
   else if (current_function_decl)
-fprintf (dfile, "\n%s:%d: note: ",
+fprintf (dfile, "\n%s:%d:%d: note: ",
  DECL_SOURCE_FILE (current_function_decl),
- DECL_SOURCE_LINE (current_function_decl));
+ DECL_SOURCE_LINE (current_function_decl),
+ DECL_SOURCE_COLUMN (current_function_decl));
 }
 }


Re: Minimize downward code motion during reassociation

2013-05-17 Thread Easwaran Raman
On Thu, Apr 25, 2013 at 4:42 AM, Richard Biener
 wrote:
> On Fri, Dec 7, 2012 at 9:01 PM, Easwaran Raman  wrote:
>> It seems I need to reset the debug uses of a statement before moving
>> the statement itself. The attached patch starts from the leaf to root
>> of the tree to be reassociated and places them at the point where
>> their dependences will be met after reassociation. This bootstraps and
>> I am running the tests. Ok if there are no test failures?
>
> +  if ((dep_bb != insert_bb
> +   && !dominated_by_p (CDI_DOMINATORS, insert_bb, dep_bb))
> +  || (dep_bb == insert_bb
> +  && gimple_uid (insert_stmt) < gimple_uid (dep_stmt)))
> +{
> +  insert_stmt = dep_stmt;
> +}
>
> superfluous {}
Removed.
>
> +  /* If there are any debug uses of LHS, reset them.  */
> +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
> +{
> +  if (is_gimple_debug (use_stmt))
> +{
> +  gimple_debug_bind_reset_value (use_stmt);
>
> that's only needed for debug stmts that are not dominated by insert_point, no?
You are right. I have added the check.

>
> +  /* Statements marked for throw can not be in the middle of a basic block. 
> So
> + we can not insert a statement (not marked for throw) immediately
> after.  */
> +  else if (stmt_can_throw_internal (insert_point))
> +{
>
> use stmt_ends_bb_p (insert_point) instead
>
> +  edge succ_edge = find_fallthru_edge (insert_bb->succs);
> +  insert_bb = succ_edge->dest;
> +  /* Insert STMT at the beginning of the successor basic block.  */
> +  gsi_insert = gsi_after_labels (insert_bb);
> +  gsi_move_before (&gsistmt, &gsi_insert);
>
> are you sure this is a proper insert location?  How would you even arrive in
> this situation given find_insert_point dominance checks?

Let's say the last statement in a BB which can throw feeds into a
statement that is to be reassociated. Then the insert point would be
this statement. Instead, you could always insert at the beginning of
its (lone) successor. This is independent of the the find_insert_point
checks.

>
> Otherwise the patch looks ok - can you add one or two testcases please?
I have added a test case and submitted at r199048.

Thanks,
Easwaran


>
> Thanks,
> Richard.
>
>> Thanks,
>> Easwaran
>>
>> 2012-12-07   Easwaran Raman  
>> * tree-ssa-reassoc.c(find_insert_point): New function.
>> (insert_stmt_after): Likewise.
>> (get_def_stmt): Likewise.
>> (ensure_ops_are_available): Likewise.
>> (rewrite_expr_tree): Do not move statements beyond what is
>> necessary. Remove call to swap_ops_for_binary_stmt...
>> (reassociate_bb): ... and move it here.
>> (build_and_add_sum): Assign UIDs for new statements.
>> (linearize_expr): Likewise.
>> (do_reassoc): Renumber gimple statement UIDs.
>>
>>
>>
>> On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener
>>  wrote:
>>> On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman  wrote:
>>>> I am unable to figure out the right way to handle the debug
>>>> statements. What I tried was to find debug statements that use the SSA
>>>> name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
>>>> and then moved them as well at the right place. Thus, if I have to
>>>> move t1 = a + b down (after the definition of 'd'), I also moved all
>>>> debug statements that use t1 after the new position of t1. That still
>>>> caused use-before-def problems in ssa_verify. I noticed that the debug
>>>> statements got modified behind the scenes causing these issues. Any
>>>> hints on what is the right way to handle the debug statements would be
>>>> very helpful.
>>>
>>> I think you cannot (and should not) move debug statements.  Instead you
>>> have to invalidate them.  Otherwise you'll introduce confusion as debug
>>> info cannot handle overlapping live ranges.
>>>
>>> But maybe Alex can clarify.
>>>
>>> Richard.


Fix PR tree-optimization/57322

2013-05-19 Thread Easwaran Raman
The UID of a newly generated statement in build_and_add_sum is set to
that of an adjacent statement in the BB. This is a problem in one case
where the BB is empty. This fix sets the UID to be 1 if the BB is
empty. Bootstraps and no test regressions on x86_64 . OK for trunk?

Thanks,
Easwaran

---

2013-05-19   Easwaran Raman  

PR tree-optimization/57322
* (build_and_add_sum): If a BB is empty, set the UID of the statement
added to the BB to be 1.

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 199048)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -1165,8 +1165,12 @@ build_and_add_sum (tree type, tree op1, tree op2,
   if ((!op1def || gimple_nop_p (op1def))
   && (!op2def || gimple_nop_p (op2def)))
 {
+  gimple first_stmt;
+  unsigned uid;
   gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR));
-  gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi)));
+  first_stmt = gsi_stmt (gsi);
+  uid = first_stmt ? gimple_uid (first_stmt) : 1;
+  gimple_set_uid (sum, uid);
   gsi_insert_before (&gsi, sum, GSI_NEW_STMT);
 }
   else if ((!op1def || gimple_nop_p (op1def))


Re: Fix PR tree-optimization/57322

2013-05-20 Thread Easwaran Raman
If you are suggesting using inc_gimple_stmt_max_uid(cfun) for all
statements, that won't work because we want to use the UIDs to
determine dominance within a BB. If your suggestion is to use that
instead of 1 when BB == NULL, that would work (even though setting it
to 1 is sufficient.)

Thanks,
Easwaran


On Mon, May 20, 2013 at 4:43 AM, Steven Bosscher  wrote:
> On Mon, May 20, 2013 at 1:41 AM, Easwaran Raman  wrote:
>> PR tree-optimization/57322
>> * (build_and_add_sum): If a BB is empty, set the UID of the statement
>> added to the BB to be 1.
>
> Maybe  "gimple_set_uid (stmt, inc_gimple_stmt_max_uid (cfun));"? That
> would keep the UIDs unique except for the copied UIDs.
>
> Ciao!
> Steven


PR tree-optimization/57337

2013-05-23 Thread Easwaran Raman
This addresses the case where UID alone is not sufficient to figure
out which statement appears earlier in  a BB. Bootstraps and no test
regressions in x86_64 on linux. Ok for trunk?

Thanks,
Easwaran


2013-05-23  Easwaran Raman  

PR tree-optimization/57337
* tree-ssa-reassoc.c (appears_later_in_bb): New function.
(find_insert_point): Correctly identify the insertion point
when two statements with the same UID is compared.

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 199211)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b)

 }

+/* Among STMT1 and STMT2, return the statement that appears later. Both
+   statements are in same BB and have the same UID.  */
+
+static gimple
+appears_later_in_bb (gimple stmt1, gimple stmt2)
+{
+  unsigned uid = gimple_uid (stmt1);
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt1);
+  gsi_next (&gsi);
+  if (gsi_end_p (gsi))
+return stmt1;
+  for (; !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+
+  /* If STMT has a different UID than STMT1 and we haven't seen
+ STMT2 during traversal, we know STMT1 appears later.  */
+  if (gimple_uid (stmt) != uid)
+return stmt1;
+  else if (stmt == stmt2)
+return stmt2;
+}
+  gcc_unreachable ();
+}
+
 /* Find the statement after which STMT must be moved so that the
dependency from DEP_STMT to STMT is maintained.  */

@@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt)
   gimple insert_stmt = stmt;
   if (dep_stmt == NULL)
 return stmt;
-  if (not_dominated_by (insert_stmt, dep_stmt))
+  if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt)
+  && gimple_bb (insert_stmt) == gimple_bb (dep_stmt)
+  && insert_stmt != dep_stmt)
+insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt);
+  else if (not_dominated_by (insert_stmt, dep_stmt))
 insert_stmt = dep_stmt;
   return insert_stmt;
 }


Re: PR tree-optimization/57337

2013-05-24 Thread Easwaran Raman
In that case, if my insert_stmt immediately follows dep_stmt and both
have the same UID, not_dominated_by would return true and I will end
up updating insert_stmt to dep_stmt which is wrong.

- Easwaran

On Fri, May 24, 2013 at 1:07 AM, Richard Biener
 wrote:
> On Thu, May 23, 2013 at 7:26 PM, Easwaran Raman  wrote:
>> This addresses the case where UID alone is not sufficient to figure
>> out which statement appears earlier in  a BB. Bootstraps and no test
>> regressions in x86_64 on linux. Ok for trunk?
>
> Why not simply conservatively use gimple_uid (a) <= gimple_uid (b)
> in not_dominated_by?
>
> Richard.
>
>
>
>> Thanks,
>> Easwaran
>>
>>
>> 2013-05-23  Easwaran Raman  
>>
>> PR tree-optimization/57337
>> * tree-ssa-reassoc.c (appears_later_in_bb): New function.
>> (find_insert_point): Correctly identify the insertion point
>> when two statements with the same UID is compared.
>>
>> Index: gcc/tree-ssa-reassoc.c
>> ===
>> --- gcc/tree-ssa-reassoc.c  (revision 199211)
>> +++ gcc/tree-ssa-reassoc.c  (working copy)
>> @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b)
>>
>>  }
>>
>> +/* Among STMT1 and STMT2, return the statement that appears later. Both
>> +   statements are in same BB and have the same UID.  */
>> +
>> +static gimple
>> +appears_later_in_bb (gimple stmt1, gimple stmt2)
>> +{
>> +  unsigned uid = gimple_uid (stmt1);
>> +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt1);
>> +  gsi_next (&gsi);
>> +  if (gsi_end_p (gsi))
>> +return stmt1;
>> +  for (; !gsi_end_p (gsi); gsi_next (&gsi))
>> +{
>> +  gimple stmt = gsi_stmt (gsi);
>> +
>> +  /* If STMT has a different UID than STMT1 and we haven't seen
>> + STMT2 during traversal, we know STMT1 appears later.  */
>> +  if (gimple_uid (stmt) != uid)
>> +return stmt1;
>> +  else if (stmt == stmt2)
>> +return stmt2;
>> +}
>> +  gcc_unreachable ();
>> +}
>> +
>>  /* Find the statement after which STMT must be moved so that the
>> dependency from DEP_STMT to STMT is maintained.  */
>>
>> @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt)
>>gimple insert_stmt = stmt;
>>if (dep_stmt == NULL)
>>  return stmt;
>> -  if (not_dominated_by (insert_stmt, dep_stmt))
>> +  if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt)
>> +  && gimple_bb (insert_stmt) == gimple_bb (dep_stmt)
>> +  && insert_stmt != dep_stmt)
>> +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt);
>> +  else if (not_dominated_by (insert_stmt, dep_stmt))
>>  insert_stmt = dep_stmt;
>>return insert_stmt;
>>  }


Re: PR tree-optimization/57337

2013-05-25 Thread Easwaran Raman
On Sat, May 25, 2013 at 4:46 AM, Richard Biener
 wrote:
> Easwaran Raman  wrote:
>
>>In that case, if my insert_stmt immediately follows dep_stmt and both
>>have the same UID, not_dominated_by would return true and I will end
>>up updating insert_stmt to dep_stmt which is wrong.
>
> But there should be a safe default answer for
> Equal uids. Unless we are asking different questions in different places.
> Thus, I do not like the stmt walking but rather have a safe fallback.

I am lost here. I don't see how we could avoid doing the stmt walking
to resolve the equal uid case. How to ensure that not_dominated_by (a,
b) returns true and not_dominated_by (b, a) returns false if A and B
have the same UID and A appears before B without doing the statement
walk. And, I don't see why the statement walk is bad. It is not likely
that there is a long sequence of statements with the same UID.

Thanks,
Easwaran

>
> Richard.
>
>>- Easwaran
>>
>>On Fri, May 24, 2013 at 1:07 AM, Richard Biener
>> wrote:
>>> On Thu, May 23, 2013 at 7:26 PM, Easwaran Raman 
>>wrote:
>>>> This addresses the case where UID alone is not sufficient to figure
>>>> out which statement appears earlier in  a BB. Bootstraps and no test
>>>> regressions in x86_64 on linux. Ok for trunk?
>>>
>>> Why not simply conservatively use gimple_uid (a) <= gimple_uid (b)
>>> in not_dominated_by?
>>>
>>> Richard.
>>>
>>>
>>>
>>>> Thanks,
>>>> Easwaran
>>>>
>>>>
>>>> 2013-05-23  Easwaran Raman  
>>>>
>>>> PR tree-optimization/57337
>>>> * tree-ssa-reassoc.c (appears_later_in_bb): New function.
>>>> (find_insert_point): Correctly identify the insertion point
>>>> when two statements with the same UID is compared.
>>>>
>>>> Index: gcc/tree-ssa-reassoc.c
>>>> ===
>>>> --- gcc/tree-ssa-reassoc.c  (revision 199211)
>>>> +++ gcc/tree-ssa-reassoc.c  (working copy)
>>>> @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b)
>>>>
>>>>  }
>>>>
>>>> +/* Among STMT1 and STMT2, return the statement that appears later.
>>Both
>>>> +   statements are in same BB and have the same UID.  */
>>>> +
>>>> +static gimple
>>>> +appears_later_in_bb (gimple stmt1, gimple stmt2)
>>>> +{
>>>> +  unsigned uid = gimple_uid (stmt1);
>>>> +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt1);
>>>> +  gsi_next (&gsi);
>>>> +  if (gsi_end_p (gsi))
>>>> +return stmt1;
>>>> +  for (; !gsi_end_p (gsi); gsi_next (&gsi))
>>>> +{
>>>> +  gimple stmt = gsi_stmt (gsi);
>>>> +
>>>> +  /* If STMT has a different UID than STMT1 and we haven't seen
>>>> + STMT2 during traversal, we know STMT1 appears later.  */
>>>> +  if (gimple_uid (stmt) != uid)
>>>> +return stmt1;
>>>> +  else if (stmt == stmt2)
>>>> +return stmt2;
>>>> +}
>>>> +  gcc_unreachable ();
>>>> +}
>>>> +
>>>>  /* Find the statement after which STMT must be moved so that the
>>>> dependency from DEP_STMT to STMT is maintained.  */
>>>>
>>>> @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple
>>dep_stmt)
>>>>gimple insert_stmt = stmt;
>>>>if (dep_stmt == NULL)
>>>>  return stmt;
>>>> -  if (not_dominated_by (insert_stmt, dep_stmt))
>>>> +  if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt)
>>>> +  && gimple_bb (insert_stmt) == gimple_bb (dep_stmt)
>>>> +  && insert_stmt != dep_stmt)
>>>> +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt);
>>>> +  else if (not_dominated_by (insert_stmt, dep_stmt))
>>>>  insert_stmt = dep_stmt;
>>>>return insert_stmt;
>>>>  }
>
>


Re: PR tree-optimization/57337

2013-05-28 Thread Easwaran Raman
On Tue, May 28, 2013 at 4:11 AM, Richard Biener
 wrote:
> On Mon, May 27, 2013 at 10:20 AM, Richard Biener
>  wrote:
>> On Sun, May 26, 2013 at 5:53 AM, Easwaran Raman  wrote:
>>> On Sat, May 25, 2013 at 4:46 AM, Richard Biener
>>>  wrote:
>>>> Easwaran Raman  wrote:
>>>>
>>>>>In that case, if my insert_stmt immediately follows dep_stmt and both
>>>>>have the same UID, not_dominated_by would return true and I will end
>>>>>up updating insert_stmt to dep_stmt which is wrong.
>>>>
>>>> But there should be a safe default answer for
>>>> Equal uids. Unless we are asking different questions in different places.
>>>> Thus, I do not like the stmt walking but rather have a safe fallback.
>>>
>>> I am lost here. I don't see how we could avoid doing the stmt walking
>>> to resolve the equal uid case. How to ensure that not_dominated_by (a,
>>> b) returns true and not_dominated_by (b, a) returns false if A and B
>>> have the same UID and A appears before B without doing the statement
>>> walk. And, I don't see why the statement walk is bad. It is not likely
>>> that there is a long sequence of statements with the same UID.
>>
>> Sure, but if you are always asking a question like "is placing X before Y 
>> ok?"
>> then you can conservatively answer "no" and code should handle that ok.
>> If you are asking questions both way then of course no conservative answer is
>> possible.  Both current uses of not_dominated_by are of the same kind,
>> if the placement is not ok then either the insert point needs adjustment or
>> the debug stmt reset.
>
> Ok, thinking about this more I come to the conclusion that a safe default for
> equal UIDs is not possible as we may be not able to order two dep_stmts.
> I still dislike walking though, but to get the quite frequent regressions 
> fixed
> the patch is ok as-is.
>
> Please install.
>
> Thanks,
> Richard.

Thanks. Submitted the patch at r199385.

- Easwaran


Fix PR 57442

2013-05-28 Thread Easwaran Raman
I made a thinko when I asserted gcc_unreachable outside the main loop
of appears_later_in_bb in my previous fix to PR 57337. It is quite
possible for STMT1 to be followed by a one or more statements with the
same UID till the end of the BB without encountering STMT2. Right fix
is to return STMT1 outside the loop. Ok for trunk?

Thanks,
Easwaran

2013-05-28  Easwaran Raman  

PR tree-optimization/57442
* tree-ssa-reassoc.c (appears_later_in_bb): Return correct value
when control exits the main loop.

2013-05-28  Easwaran Raman  

PR tree-optimization/57442
* gcc.dg/tree-ssa/reassoc-30.c: New testcase.

Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0)
@@ -0,0 +1,13 @@
+/* PR tree-optimization/57442 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+short a;
+unsigned b;
+long c;
+int d;
+
+void f(void)
+{
+b = a ? : (a = b) - c + (d - (b + b));
+}
+
Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c (revision 199385)
+++ gcc/tree-ssa-reassoc.c (working copy)
@@ -2888,7 +2888,7 @@ appears_later_in_bb (gimple stmt1, gimple stmt2)
   else if (stmt == stmt2)
 return stmt2;
 }
-  gcc_unreachable ();
+  return stmt1;
 }

 /* Find the statement after which STMT must be moved so that the


[google gcc-4_8] Change size accounting during inlining in lipo mode

2013-06-20 Thread Easwaran Raman
In lipo mode, this patch updates the overall unit size only when the
eventual function to which the callee is inlined is in primary module.
This is to avoid the situation where the module growth budget is used
up by inlines into auxiliary module functions that never get inlined
into some primary module function. Ok for google/gcc-4_8 branch?

Thanks,
Easwaran


lipo_inline.patch
Description: Binary data


Fix PR middle-end/57370

2013-06-27 Thread Easwaran Raman
A newly generated statement in build_and_add_sum  function of
tree-ssa-reassoc.c has to be assigned the UID of its adjacent
statement. In one instance, it was assigned the wrong uid (of an
earlier phi statement) which messed up the IR and caused the test
program to hang. Bootstraps and no test regressions on x86_64/linux.
Ok for trunk?

Thanks,
Easwaran


2013-06-27  Easwaran Raman  

PR middle-end/57370
* tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi
node for a non-phi gimple statement.

testsuite/ChangeLog:
2013-06-27  Easwaran Raman  

PR middle-end/57370
* gfortran.dg/reassoc_12.f90: New testcase.


Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
===
--- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
@@ -0,0 +1,74 @@
+! { dg-do compile }
+! { dg-options "-O2 -ffast-math" }
+! PR middle-end/57370
+
+ SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
+   grad_deriv,npoints, sx)
+IMPLICIT REAL*8 (t)
+INTEGER, PARAMETER :: dp=8
+REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
+   e_ndrho_ndrho_rho
+  DO ii=1,npoints
+  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
+t1425 = t233 * t557
+t1429 = beta * t225
+t1622 = t327 * t1621
+t1626 = t327 * t1625
+t1632 = t327 * t1631
+t1685 = t105 * t1684
+t2057 = t1636 + t8 * (t2635 + t3288)
+  END IF
+  IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN
+t5469 = t5440 - t5443 - t5446 - t5449 - &
+t5451 - t5454 - t5456 + t5459  - &
+t5462 + t5466 - t5468
+t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
+t5489 = 0.16e2_dp * t1429 * t1658
+t5531 = 0.160e2_dp * t112 * t1626
+t5533 = 0.160e2_dp * t112 * t1632
+t5537 = 0.160e2_dp * t112 * t1622
+t5541 = t5472 - t5478 - t5523 + t5525 + &
+t5531 + t5533 + t5535 + t5537 + &
+t5540
+t5565 = t112 * t1685
+t5575 = t5545 - t5548 + t5551 + t5553 - &
+t5558 + t5560 - t5562 + t5564 - &
+0.80e1_dp * t5565 + t5568 + t5572 + &
+t5574
+t5611 = t5579 - t5585 + t5590 - t5595 + &
+t5597 - t5602 + t5604 + t5607 + &
+t5610
+t5613 = t5469 + t5541 + t5575 + t5611
+t6223 = t6189 - &
+0.36e0_dp  * t83 * t84 * t5613 + &
+t6222
+t6227 = - t8 * (t5305 + t6223)
+e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + &
+ t6227 * sx
+t6352 = t5440 - t5443 - t5446 - t5449 - &
+t5451 - t5454 + &
+0.40e1_dp * t102  * t327 * t2057 * t557 - &
+t5456 + t5459 - t5462 + t5466 - &
+t5468
+t6363 = t5480 - t5489 + &
+0.96e2_dp  * t1054 * t640 * t3679
+t6367 = t5472 - t5474 - t5478 - t5523 + &
+t5525 + t5531 + t5533 + t5535 + &
+t5537 - 0.20e1_dp * t102 * t105 * t6363 + &
+t5540
+t6370 = t5545 - t5548 + t5551 + t5553 - &
+t5558 + t5560 - t5562 + t5564  - &
+0.40e1_dp * t5565 + &
+t5568 + t5572 + t5574
+t6373 = t5579 - t5585 + t5590 - t5595 + &
+t5597 - t5602 + t5604 + t5607 + &
+t5610
+t6375 = t6352 + t6367 + t6370 + t6373
+t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701
+t6669 = -t4704 - t8 * (t6344 + t6380 + t6665)
+e_ndrho_ndrho_ndrho(ii) = e_ndrho_ndrho_ndrho(ii) + &
+ t6669 * sx
+  END IF
+  END DO
+  END SUBROUTINE xb88_lr_adiabatic_lda_calc
+
Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c(revision 200429)
+++ gcc/tree-ssa-reassoc.c(working copy)
@@ -1207,7 +1207,7 @@ build_and_add_sum (tree type, tree op1, tree op2,
   if (gimple_code (op1def) == GIMPLE_PHI)
 {
   gsi = gsi_after_labels (gimple_bb (op1def));
-   gimple_set_uid (sum, gimple_uid (op1def));
+  gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi)));
   gsi_insert_before (&gsi, sum, GSI_NEW_STMT);
 }
   else


Re: Fix PR middle-end/57370

2013-07-12 Thread Easwaran Raman
Ping.

On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  wrote:
> A newly generated statement in build_and_add_sum  function of
> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
> statement. In one instance, it was assigned the wrong uid (of an
> earlier phi statement) which messed up the IR and caused the test
> program to hang. Bootstraps and no test regressions on x86_64/linux.
> Ok for trunk?
>
> Thanks,
> Easwaran
>
>
> 2013-06-27  Easwaran Raman  
>
> PR middle-end/57370
> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi
> node for a non-phi gimple statement.
>
> testsuite/ChangeLog:
> 2013-06-27  Easwaran Raman  
>
> PR middle-end/57370
> * gfortran.dg/reassoc_12.f90: New testcase.
>
>
> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
> ===
> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
> @@ -0,0 +1,74 @@
> +! { dg-do compile }
> +! { dg-options "-O2 -ffast-math" }
> +! PR middle-end/57370
> +
> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
> +   grad_deriv,npoints, sx)
> +IMPLICIT REAL*8 (t)
> +INTEGER, PARAMETER :: dp=8
> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
> +   e_ndrho_ndrho_rho
> +  DO ii=1,npoints
> +  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
> +t1425 = t233 * t557
> +t1429 = beta * t225
> +t1622 = t327 * t1621
> +t1626 = t327 * t1625
> +t1632 = t327 * t1631
> +t1685 = t105 * t1684
> +t2057 = t1636 + t8 * (t2635 + t3288)
> +  END IF
> +  IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN
> +t5469 = t5440 - t5443 - t5446 - t5449 - &
> +t5451 - t5454 - t5456 + t5459  - &
> +t5462 + t5466 - t5468
> +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
> +t5489 = 0.16e2_dp * t1429 * t1658
> +t5531 = 0.160e2_dp * t112 * t1626
> +t5533 = 0.160e2_dp * t112 * t1632
> +t5537 = 0.160e2_dp * t112 * t1622
> +t5541 = t5472 - t5478 - t5523 + t5525 + &
> +t5531 + t5533 + t5535 + t5537 + &
> +t5540
> +t5565 = t112 * t1685
> +t5575 = t5545 - t5548 + t5551 + t5553 - &
> +t5558 + t5560 - t5562 + t5564 - &
> +0.80e1_dp * t5565 + t5568 + t5572 + &
> +t5574
> +t5611 = t5579 - t5585 + t5590 - t5595 + &
> +t5597 - t5602 + t5604 + t5607 + &
> +t5610
> +t5613 = t5469 + t5541 + t5575 + t5611
> +t6223 = t6189 - &
> +0.36e0_dp  * t83 * t84 * t5613 + &
> +t6222
> +t6227 = - t8 * (t5305 + t6223)
> +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + &
> + t6227 * sx
> +t6352 = t5440 - t5443 - t5446 - t5449 - &
> +t5451 - t5454 + &
> +0.40e1_dp * t102  * t327 * t2057 * t557 - &
> +t5456 + t5459 - t5462 + t5466 - &
> +t5468
> +t6363 = t5480 - t5489 + &
> +0.96e2_dp  * t1054 * t640 * t3679
> +t6367 = t5472 - t5474 - t5478 - t5523 + &
> +t5525 + t5531 + t5533 + t5535 + &
> +t5537 - 0.20e1_dp * t102 * t105 * t6363 + &
> +t5540
> +t6370 = t5545 - t5548 + t5551 + t5553 - &
> +t5558 + t5560 - t5562 + t5564  - &
> +0.40e1_dp * t5565 + &
> +t5568 + t5572 + t5574
> +t6373 = t5579 - t5585 + t5590 - t5595 + &
> +t5597 - t5602 + t5604 + t5607 + &
> +t5610
> +t6375 = t6352 + t6367 + t6370 + t6373
> +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701
> +t6669 = -t4704 - t8 * (t6344 + t6380 + t6665)
> +e_ndrho_ndrho_ndrho(ii) = e_ndrho_ndrho_ndrho(ii) + &
> + t6669 * sx
> +  END IF
> +  END DO
> +  END SUBROUTINE xb88_lr_adiabatic_lda_calc
> +
> Index: gcc/tree-ssa-reassoc.c
> 

Re: Fix PR middle-end/57370

2013-07-31 Thread Easwaran Raman
I have a new patch that supersedes this. The new patch also fixes PR
tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
no test regression on x86_64/linux. Ok for trunk?

2013-07-31  Easwaran Raman  

PR middle-end/57370
* tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
of debug statements that cause inconsistent IR.


testsuite/ChangeLog:
2013-07-31  Easwaran Raman  

PR middle-end/57370
PR tree-optimization/57393
PR tree-optimization/58011
* gfortran.dg/reassoc_12.f90: New testcase.
* gcc.dg/tree-ssa/reassoc-31.c: New testcase.
* gcc.dg/tree-ssa/reassoc-31.c: New testcase.


On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
> Ping.
>
> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  wrote:
>> A newly generated statement in build_and_add_sum  function of
>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>> statement. In one instance, it was assigned the wrong uid (of an
>> earlier phi statement) which messed up the IR and caused the test
>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>> Ok for trunk?
>>
>> Thanks,
>> Easwaran
>>
>>
>> 2013-06-27  Easwaran Raman  
>>
>> PR middle-end/57370
>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi
>> node for a non-phi gimple statement.
>>
>> testsuite/ChangeLog:
>> 2013-06-27  Easwaran Raman  
>>
>> PR middle-end/57370
>> * gfortran.dg/reassoc_12.f90: New testcase.
>>
>>
>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
>> ===
>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>> @@ -0,0 +1,74 @@
>> +! { dg-do compile }
>> +! { dg-options "-O2 -ffast-math" }
>> +! PR middle-end/57370
>> +
>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
>> +   grad_deriv,npoints, sx)
>> +IMPLICIT REAL*8 (t)
>> +INTEGER, PARAMETER :: dp=8
>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
>> +   e_ndrho_ndrho_rho
>> +  DO ii=1,npoints
>> +  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
>> +t1425 = t233 * t557
>> +t1429 = beta * t225
>> +t1622 = t327 * t1621
>> +t1626 = t327 * t1625
>> +t1632 = t327 * t1631
>> +t1685 = t105 * t1684
>> +t2057 = t1636 + t8 * (t2635 + t3288)
>> +  END IF
>> +  IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN
>> +t5469 = t5440 - t5443 - t5446 - t5449 - &
>> +t5451 - t5454 - t5456 + t5459  - &
>> +t5462 + t5466 - t5468
>> +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
>> +t5489 = 0.16e2_dp * t1429 * t1658
>> +t5531 = 0.160e2_dp * t112 * t1626
>> +t5533 = 0.160e2_dp * t112 * t1632
>> +t5537 = 0.160e2_dp * t112 * t1622
>> +t5541 = t5472 - t5478 - t5523 + t5525 + &
>> +t5531 + t5533 + t5535 + t5537 + &
>> +t5540
>> +t5565 = t112 * t1685
>> +t5575 = t5545 - t5548 + t5551 + t5553 - &
>> +t5558 + t5560 - t5562 + t5564 - &
>> +0.80e1_dp * t5565 + t5568 + t5572 + &
>> +t5574
>> +t5611 = t5579 - t5585 + t5590 - t5595 + &
>> +t5597 - t5602 + t5604 + t5607 + &
>> +t5610
>> +t5613 = t5469 + t5541 + t5575 + t5611
>> +t6223 = t6189 - &
>> +0.36e0_dp  * t83 * t84 * t5613 + &
>> +t6222
>> +t6227 = - t8 * (t5305 + t6223)
>> +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + &
>> + t6227 * sx
>> +t6352 = t5440 - t5443 - t5446 - t5449 - &
>> +t5451 - t5454 + &
>> +0.40e1_dp * t102  * t327 * t2057 * t557 - &
>> +t5456 + t5459 - t5462 + t5466 - &
>> +t5468
>> +t6363 = t5480 - t5489 + &
>> +0.96e2_dp  * t1054 * t640 * t3679
>> +t6367 = t5472 - t5474 - t5478 - t5523 + &

Re: Fix PR middle-end/57370

2013-08-07 Thread Easwaran Raman
Ping.

On Wed, Jul 31, 2013 at 4:34 PM, Easwaran Raman  wrote:
> I have a new patch that supersedes this. The new patch also fixes PR
> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
> no test regression on x86_64/linux. Ok for trunk?
>
> 2013-07-31  Easwaran Raman  
>
> PR middle-end/57370
> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
> of debug statements that cause inconsistent IR.
>
>
> testsuite/ChangeLog:
> 2013-07-31  Easwaran Raman  
>
> PR middle-end/57370
> PR tree-optimization/57393
> PR tree-optimization/58011
> * gfortran.dg/reassoc_12.f90: New testcase.
> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>
>
> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
>> Ping.
>>
>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  wrote:
>>> A newly generated statement in build_and_add_sum  function of
>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>>> statement. In one instance, it was assigned the wrong uid (of an
>>> earlier phi statement) which messed up the IR and caused the test
>>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>>> Ok for trunk?
>>>
>>> Thanks,
>>> Easwaran
>>>
>>>
>>> 2013-06-27  Easwaran Raman  
>>>
>>>     PR middle-end/57370
>>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a 
>>> phi
>>> node for a non-phi gimple statement.
>>>
>>> testsuite/ChangeLog:
>>> 2013-06-27  Easwaran Raman  
>>>
>>> PR middle-end/57370
>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>>
>>>
>>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
>>> ===
>>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>>> @@ -0,0 +1,74 @@
>>> +! { dg-do compile }
>>> +! { dg-options "-O2 -ffast-math" }
>>> +! PR middle-end/57370
>>> +
>>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
>>> +   grad_deriv,npoints, sx)
>>> +IMPLICIT REAL*8 (t)
>>> +INTEGER, PARAMETER :: dp=8
>>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
>>> +   e_ndrho_ndrho_rho
>>> +  DO ii=1,npoints
>>> +  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
>>> +t1425 = t233 * t557
>>> +t1429 = beta * t225
>>> +t1622 = t327 * t1621
>>> +t1626 = t327 * t1625
>>> +t1632 = t327 * t1631
>>> +t1685 = t105 * t1684
>>> +t2057 = t1636 + t8 * (t2635 + t3288)
>>> +  END IF
>>> +  IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN
>>> +t5469 = t5440 - t5443 - t5446 - t5449 - &
>>> +t5451 - t5454 - t5456 + t5459  - &
>>> +t5462 + t5466 - t5468
>>> +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
>>> +t5489 = 0.16e2_dp * t1429 * t1658
>>> +t5531 = 0.160e2_dp * t112 * t1626
>>> +t5533 = 0.160e2_dp * t112 * t1632
>>> +t5537 = 0.160e2_dp * t112 * t1622
>>> +t5541 = t5472 - t5478 - t5523 + t5525 + &
>>> +t5531 + t5533 + t5535 + t5537 + &
>>> +t5540
>>> +t5565 = t112 * t1685
>>> +t5575 = t5545 - t5548 + t5551 + t5553 - &
>>> +t5558 + t5560 - t5562 + t5564 - &
>>> +0.80e1_dp * t5565 + t5568 + t5572 + &
>>> +t5574
>>> +t5611 = t5579 - t5585 + t5590 - t5595 + &
>>> +t5597 - t5602 + t5604 + t5607 + &
>>> +t5610
>>> +t5613 = t5469 + t5541 + t5575 + t5611
>>> +t6223 = t6189 - &
>>> +0.36e0_dp  * t83 * t84 * t5613 + &
>>> +t6222
>>> +t6227 = - t8 * (t5305 + t6223)
>>> +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + &
>>> +

Re: Minimize downward code motion during reassociation

2013-04-24 Thread Easwaran Raman
I want to resend this patch for consideration. I applied the patch to
trunk and confirmed that it bootstraps and doesn't cause test
regressions. Is this ok for trunk?

Thanks,
Easwaran

On Fri, Dec 7, 2012 at 12:01 PM, Easwaran Raman  wrote:
> It seems I need to reset the debug uses of a statement before moving
> the statement itself. The attached patch starts from the leaf to root
> of the tree to be reassociated and places them at the point where
> their dependences will be met after reassociation. This bootstraps and
> I am running the tests. Ok if there are no test failures?
>
> Thanks,
> Easwaran
>
> 2012-12-07   Easwaran Raman  
> * tree-ssa-reassoc.c(find_insert_point): New function.
> (insert_stmt_after): Likewise.
> (get_def_stmt): Likewise.
> (ensure_ops_are_available): Likewise.
> (rewrite_expr_tree): Do not move statements beyond what is
> necessary. Remove call to swap_ops_for_binary_stmt...
> (reassociate_bb): ... and move it here.
> (build_and_add_sum): Assign UIDs for new statements.
> (linearize_expr): Likewise.
> (do_reassoc): Renumber gimple statement UIDs.
>
>
>
> On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener
>  wrote:
>> On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman  wrote:
>>> I am unable to figure out the right way to handle the debug
>>> statements. What I tried was to find debug statements that use the SSA
>>> name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
>>> and then moved them as well at the right place. Thus, if I have to
>>> move t1 = a + b down (after the definition of 'd'), I also moved all
>>> debug statements that use t1 after the new position of t1. That still
>>> caused use-before-def problems in ssa_verify. I noticed that the debug
>>> statements got modified behind the scenes causing these issues. Any
>>> hints on what is the right way to handle the debug statements would be
>>> very helpful.
>>
>> I think you cannot (and should not) move debug statements.  Instead you
>> have to invalidate them.  Otherwise you'll introduce confusion as debug
>> info cannot handle overlapping live ranges.
>>
>> But maybe Alex can clarify.
>>
>> Richard.


Change the ordering of cdce pass

2012-06-14 Thread Easwaran Raman
The conditional dead call elimination pass shrink wraps certain dead
calls to math functions. It doesn't handle case like this:

D.142420_139 = powD.549 (D.142421_138, D.142419_132);
 fooD.120935.barD.113815 = D.142420_139;
# foo.bar is dead here.

This code gets cleaned up by DCE and leaves only pow, which can then
be shrink-wrapped by cdce. So it seems reasonable to do this
reordering. Bootstraps on x86_64 on linux with no test regression. OK
for trunk?

- Easwaran

--

2012-06-14   Easwaran Raman  

* gcc/passes.c (init_optimization_passes): Remove pass_call_cdce
from its current position and insert after pass_dce.

Index: gcc/passes.c
===
--- gcc/passes.c(revision 188535)
+++ gcc/passes.c(working copy)
@@ -1374,7 +1374,6 @@ init_optimization_passes (void)
   NEXT_PASS (pass_complete_unrolli);
   NEXT_PASS (pass_ccp);
   NEXT_PASS (pass_forwprop);
-  NEXT_PASS (pass_call_cdce);
   /* pass_build_alias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  Re-building
 alias information also rewrites no longer addressed
@@ -1387,6 +1386,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_vrp);
   NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_call_cdce);
   NEXT_PASS (pass_cselim);
   NEXT_PASS (pass_tree_ifcombine);
   NEXT_PASS (pass_phiopt);


Re: Change the ordering of cdce pass

2012-06-14 Thread Easwaran Raman
ChangeLog entry has a gcc/ prefix that shouldn't be there. Here is the
revised entry:

2012-06-14   Easwaran Raman  

* passes.c (init_optimization_passes): Remove pass_call_cdce
from its current position and insert after pass_dce.



On Thu, Jun 14, 2012 at 6:38 PM, Easwaran Raman  wrote:
> The conditional dead call elimination pass shrink wraps certain dead
> calls to math functions. It doesn't handle case like this:
>
> D.142420_139 = powD.549 (D.142421_138, D.142419_132);
>  fooD.120935.barD.113815 = D.142420_139;
> # foo.bar is dead here.
>
> This code gets cleaned up by DCE and leaves only pow, which can then
> be shrink-wrapped by cdce. So it seems reasonable to do this
> reordering. Bootstraps on x86_64 on linux with no test regression. OK
> for trunk?
>
> - Easwaran
>
> --
>
> 2012-06-14   Easwaran Raman  
>
>        * gcc/passes.c (init_optimization_passes): Remove pass_call_cdce
>        from its current position and insert after pass_dce.
>
> Index: gcc/passes.c
> ===
> --- gcc/passes.c        (revision 188535)
> +++ gcc/passes.c        (working copy)
> @@ -1374,7 +1374,6 @@ init_optimization_passes (void)
>       NEXT_PASS (pass_complete_unrolli);
>       NEXT_PASS (pass_ccp);
>       NEXT_PASS (pass_forwprop);
> -      NEXT_PASS (pass_call_cdce);
>       /* pass_build_alias is a dummy pass that ensures that we
>         execute TODO_rebuild_alias at this point.  Re-building
>         alias information also rewrites no longer addressed
> @@ -1387,6 +1386,7 @@ init_optimization_passes (void)
>       NEXT_PASS (pass_merge_phi);
>       NEXT_PASS (pass_vrp);
>       NEXT_PASS (pass_dce);
> +      NEXT_PASS (pass_call_cdce);
>       NEXT_PASS (pass_cselim);
>       NEXT_PASS (pass_tree_ifcombine);
>       NEXT_PASS (pass_phiopt);


Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)

2011-09-27 Thread Easwaran Raman
+static inline hashval_t
+edge_hash_function (unsigned int id1, unsigned int id2)
+{
+  /* If the number of functions is less than 1000, this gives a unique value
+     for every function id combination.  */
+  const int MULTIPLIER = 1000;
+  return id1* MULTIPLIER + id2;

Change to id1 << 16 | id2

+  if (edge_map == NULL)
+    {
+      edge_map = htab_create (25, edge_map_htab_hash_descriptor,
+  edge_map_htab_eq_descriptor , NULL);

 Use a  larger size and don't hardcode the value.

+  if (function_map == NULL)
+    {
+      function_map = htab_create (25, function_map_htab_hash_descriptor,
+  function_map_htab_eq_descriptor , NULL);

 Use a  larger size and don't hardcode the value.

+  caller_node = get_function_node (caller);
+  /* Real nodes are nodes that correspond to .text sections found.  These will
+     definitely be sorted.  */
+  set_as_real_node (caller_node);

This is redundant as set_node_type sets the type correctly.

+  if (*slot == NULL)
+    {
+      reset_functions (edge, new_node, kept_node);
+      *slot = edge;
+      add_edge_to_node (new_node, edge);

edge will still be in old_node's edge_list

+static void set_node_type (Node *n)
+{
+  void **slot;
+  char *name = n->name;
+  slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name),
+   INSERT);

Use htab_find_with_hash or pass NOINSERT.

+  /* Allocate section_map.  */
+  if (section_map == NULL)
+    {
+      section_map = htab_create (10, section_map_htab_hash_descriptor,
+ section_map_htab_eq_descriptor , NULL);

With -ffunction-sections, this should be roughly equal to size of function_map.

+static void
+write_out_node (FILE *fp, char *name,
+ void **handles, unsigned int *shndx, int position)
+{
+  void **slot;
+  slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name),
+   INSERT);

Use htab_find_with_hash or pass NOINSERT.

Index: function_reordering_plugin/function_reordering_plugin.c
===
--- function_reordering_plugin/function_reordering_plugin.c (revision 0)
+++ function_reordering_plugin/function_reordering_plugin.c (revision 0)

+  parse_callgraph_section_contents (section_contents,
(unsigned int)length);
+}
+  else if (name != NULL)
+free (name);

name should be freed in the body of then and else-if  as well.

-Easwaran
On Mon, Sep 26, 2011 at 5:22 PM, Sriraman Tallam  wrote:
>
> *Resending as plain text*
>
> I am attaching a patch of the updated files. This patch was meant for
> the google gcc-4_6 branch. Sorry for not mentioning this upfront in my
> original mail.
> Thanks,
>  -Sri.
>
>
>
> > On Mon, Sep 26, 2011 at 9:53 AM, Sriraman Tallam  
> > wrote:
> >>
> >> On Mon, Sep 26, 2011 at 9:22 AM, Nathan Froyd  wrote:
> >> > On 9/23/2011 6:03 PM, Sriraman Tallam wrote:
> >> >>
> >> >> This patch adds a new linker plugin to re-order functions.
> >> >
> >> > This is great stuff.  We were experimenting with using the coverage 
> >> > files to
> >> > generate an ordering for --section-ordering-file, but this might be even
> >> > better, will have to experiment with it.
> >> >
> >> > A couple of comments on the code itself:
> >> >
> >> >> Index: function_reordering_plugin/callgraph.h
> >> >> +inline static Edge_list *
> >> >> +make_edge_list (Edge *e)
> >> >> +{
> >> >> +  Edge_list *list = (Edge_list *)malloc (sizeof (Edge_list));
> >> >
> >> > If you are going to use libiberty via hashtab.h, you might as well make 
> >> > use
> >> > of the *ALLOC family of macros to make this and other allocations a 
> >> > little
> >> > neater.
> >> >
> >> >> +/*Represents an edge in the call graph.  */
> >> >> +struct __edge__
> >> >
> >> > I think the usual convention is to call this edge_d or something similar,
> >> > avoiding the profusion of underscores.
> >> >
> >> >> +void
> >> >> +map_section_name_to_index (char *section_name, void *handle, int shndx)
> >> >> +{
> >> >> +  void **slot;
> >> >> +  char *function_name;
> >> >> +
> >> >> +  if (is_prefix_of (".text.hot.", section_name))
> >> >> +    function_name = section_name + 10 /* strlen (".text.hot.") */;
> >> >> +  else if (is_prefix_of (".text.unlikely.", section_name))
> >> >> +    function_name = section_name + 15 /* strlen (".text.unlikely.") */;
> >> >> +  else if (is_prefix_of (".text.cold.", section_name))
> >> >> +    function_name = section_name + 11 /* strlen (".text.cold.") */;
> >> >> +  else if (is_prefix_of (".text.startup.", section_name))
> >> >> +    function_name = section_name + 14 /* strlen (".text.startup.") */;
> >> >> +  else
> >> >> +    function_name = section_name + 6 /*strlen (".text.") */;
> >> >
> >> > You don't handle plain .text; this is assuming -ffunction-sections, 
> >> > right?
> >> >  Can this limitation be easily removed?
> >>
> >> You need to compile with -ffunction-sections or the individual
> >> sections cannot be re-ordered. That is why I assumed a .text.* prefix
> >> before every function sec

Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)

2011-09-27 Thread Easwaran Raman
OK for google/gcc-4_6 and google/main branches.
-Easwaran
>
> On Tue, Sep 27, 2011 at 4:07 PM, Sriraman Tallam  wrote:
>>
>> Made all the changes. Attaching new patch of updated files.
>>
>> On Tue, Sep 27, 2011 at 11:26 AM, Easwaran Raman  wrote:
>> >
>> > +static inline hashval_t
>> > +edge_hash_function (unsigned int id1, unsigned int id2)
>> > +{
>> > +  /* If the number of functions is less than 1000, this gives a unique 
>> > value
>> > +     for every function id combination.  */
>> > +  const int MULTIPLIER = 1000;
>> > +  return id1* MULTIPLIER + id2;
>> >
>> > Change to id1 << 16 | id2
>> >
>> > +  if (edge_map == NULL)
>> > +    {
>> > +      edge_map = htab_create (25, edge_map_htab_hash_descriptor,
>> > +  edge_map_htab_eq_descriptor , NULL);
>> >
>> >  Use a  larger size and don't hardcode the value.
>> >
>> > +  if (function_map == NULL)
>> > +    {
>> > +      function_map = htab_create (25, function_map_htab_hash_descriptor,
>> > +  function_map_htab_eq_descriptor , NULL);
>> >
>> >  Use a  larger size and don't hardcode the value.
>> >
>> > +  caller_node = get_function_node (caller);
>> > +  /* Real nodes are nodes that correspond to .text sections found.  These 
>> > will
>> > +     definitely be sorted.  */
>> > +  set_as_real_node (caller_node);
>> >
>> > This is redundant as set_node_type sets the type correctly.
>> >
>> > +  if (*slot == NULL)
>> > +    {
>> > +      reset_functions (edge, new_node, kept_node);
>> > +      *slot = edge;
>> > +      add_edge_to_node (new_node, edge);
>> >
>> > edge will still be in old_node's edge_list
>>
>> I do not have to explicitly delete this as the merged node's edge_list
>> will never be traversed except when cleaning up.
>>
>> >
>> > +static void set_node_type (Node *n)
>> > +{
>> > +  void **slot;
>> > +  char *name = n->name;
>> > +  slot = htab_find_slot_with_hash (section_map, name, htab_hash_string 
>> > (name),
>> > +   INSERT);
>> >
>> > Use htab_find_with_hash or pass NOINSERT.
>> >
>> > +  /* Allocate section_map.  */
>> > +  if (section_map == NULL)
>> > +    {
>> > +      section_map = htab_create (10, section_map_htab_hash_descriptor,
>> > + section_map_htab_eq_descriptor , NULL);
>> >
>> > With -ffunction-sections, this should be roughly equal to size of 
>> > function_map.
>> >
>> > +static void
>> > +write_out_node (FILE *fp, char *name,
>> > + void **handles, unsigned int *shndx, int position)
>> > +{
>> > +  void **slot;
>> > +  slot = htab_find_slot_with_hash (section_map, name, htab_hash_string 
>> > (name),
>> > +   INSERT);
>> >
>> > Use htab_find_with_hash or pass NOINSERT.
>> >
>> > Index: function_reordering_plugin/function_reordering_plugin.c
>> > ===
>> > --- function_reordering_plugin/function_reordering_plugin.c     (revision 
>> > 0)
>> > +++ function_reordering_plugin/function_reordering_plugin.c     (revision 
>> > 0)
>> >
>> > +          parse_callgraph_section_contents (section_contents,
>> > (unsigned int)length);
>> > +        }
>> > +      else if (name != NULL)
>> > +        free (name);
>> >
>> > name should be freed in the body of then and else-if  as well.
>> >
>> > -Easwaran
>> > On Mon, Sep 26, 2011 at 5:22 PM, Sriraman Tallam  
>> > wrote:
>> > >
>> > > *Resending as plain text*
>> > >
>> > > I am attaching a patch of the updated files. This patch was meant for
>> > > the google gcc-4_6 branch. Sorry for not mentioning this upfront in my
>> > > original mail.
>> > > Thanks,
>> > >  -Sri.
>> > >
>> > >
>> > >
>> > > > On Mon, Sep 26, 2011 at 9:53 AM, Sriraman Tallam  
>> > > > wrote:
>> > > >>
>> > > >> On Mon, Sep 26, 2011 at 9:22 AM, Nathan Froyd  
>> > > >> wrote:
>> > > >> > On 9/23/2011 6:03 PM, Sriraman Tallam wrote:
>> > > >> >>
>> > > >> >> This patch adds a new li

[google] Fix bugs in sampled profile collection

2011-09-30 Thread Easwaran Raman
This fixes two issues with sampled profile collection. It delays
cleanup of instrumentation_to_be_sampled after all callgraph nodes
have been instrumented and prevents  gcov_sample_counter_decl and
gcov_sampling_rate_decl from being garbage collected.

 Ok for google/gcc-4_6 and google/main branches?

-Easwaran

2011-09-30  Easwaran Raman  

* tree-profile.c (gcov_sample_counter_decl): Add GTY marker.
(gcov_sampling_rate_decl): Likewise.
(add_sampling_to_edge_counters): Do not free
instrumentation_to_be_sampled.
(cleanup_instrumentation_sampling): New function.
(tree_profiling): Call cleanup_instrumentation_sampling at the end.

testsuite/ChangeLog.google-4_6:

2011-09-30  Easwaran Raman  

* gcc.dg/sample-profile-generate-1.c: New test.

Index: gcc/testsuite/gcc.dg/sample-profile-generate-1.c
===
--- gcc/testsuite/gcc.dg/sample-profile-generate-1.c(revision 0)
+++ gcc/testsuite/gcc.dg/sample-profile-generate-1.c(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -fprofile-generate -fprofile-generate-sampling" } */
+
+void foobar(int);
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+{
+  foobar(i);
+}
+}
+
+void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+{
+  foobar(i);
+}
+}
+
+/* { dg-final { cleanup-coverage-files } } */

Index: tree-profile.c
===
--- tree-profile.c  (revision 178897)
+++ tree-profile.c  (working copy)
@@ -163,10 +163,10 @@ init_ic_make_global_vars (void)
 static struct pointer_set_t *instrumentation_to_be_sampled = NULL;

 /* extern __thread gcov_unsigned_t __gcov_sample_counter  */
-static tree gcov_sample_counter_decl = NULL_TREE;
+static GTY(()) tree gcov_sample_counter_decl = NULL_TREE;

 /* extern gcov_unsigned_t __gcov_sampling_rate  */
-static tree gcov_sampling_rate_decl = NULL_TREE;
+static GTY(()) tree gcov_sampling_rate_decl = NULL_TREE;

 /* forward declaration.  */
 void gimple_init_instrumentation_sampling (void);
@@ -281,9 +281,13 @@ add_sampling_to_edge_counters (void)
 break;
   }
   }
+}

+static void
+cleanup_instrumentation_sampling (void)
+{
   /* Free the bitmap.  */
-  if (instrumentation_to_be_sampled)
+  if (flag_profile_generate_sampling && instrumentation_to_be_sampled)
 {
   pointer_set_destroy (instrumentation_to_be_sampled);
   instrumentation_to_be_sampled = NULL;
@@ -1452,6 +1456,7 @@ tree_profiling (void)
 }

   del_node_map();
+  cleanup_instrumentation_sampling();
   return 0;
 }


[google] Use delete with size parameter in STL deallocate (issue5794070)

2012-03-12 Thread Easwaran Raman
This patch makes -fsized-delete define a macro __GXX_DELETE_WITH_SIZE__. If
this macro is defined, the deallocate function in new_allocator.h invokes
the two parameter delete operator with the size of the object. Tested by
bootstrapping the compiler and ensuring no tests are broken. Also verified
that compiling a C++ program that uses std::vector with -fsized-delete invokes
the two parameter version of operator delete.

OK for google/main and google/4_6 branches?

c-family/ChangeLog.google-main:
2012-03-12   Easwaran Raman  

* c-cppbuiltin.c (c_cpp_builtins): Define __GXX_DELETE_WITH_SIZE__ for
C++ files compiled with -fsized-delete.

libstdc++-v3/ChangeLog.google-main:
2012-03-12   Easwaran Raman  
* include/ext/new_allocator.h (deallocate): Call delete with size if
__GXX_DELETE_WITH_SIZE__ is defined.


diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index ccf57fd..3c9a96b 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -970,6 +970,8 @@ c_cpp_builtins (cpp_reader *pfile)
  format.  */
   if (ENABLE_DECIMAL_FLOAT && ENABLE_DECIMAL_BID_FORMAT)
 cpp_define (pfile, "__DECIMAL_BID_FORMAT__");
+  if (c_dialect_cxx () && flag_sized_delete)
+cpp_define (pfile, "__GXX_DELETE_WITH_SIZE__");
 }
 
 /* Pass an object-like macro.  If it doesn't lie in the user's
diff --git a/libstdc++-v3/include/ext/new_allocator.h 
b/libstdc++-v3/include/ext/new_allocator.h
index 0c82bd0..c972c1e 100644
--- a/libstdc++-v3/include/ext/new_allocator.h
+++ b/libstdc++-v3/include/ext/new_allocator.h
@@ -94,10 +94,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
   }
 
+#ifdef __GXX_DELETE_WITH_SIZE__
+  // __p is not permitted to be a null pointer.
+  void
+  deallocate(pointer __p, size_type __t)
+  { ::operator delete(__p, __t * sizeof(_Tp)); }
+#else
   // __p is not permitted to be a null pointer.
   void
   deallocate(pointer __p, size_type)
   { ::operator delete(__p); }
+#endif
 
   size_type
   max_size() const _GLIBCXX_USE_NOEXCEPT

--
This patch is available for review at http://codereview.appspot.com/5794070


Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Easwaran Raman
This patch propagates execution count of thee case labels of a
switch-case statement after its expansion. Bootstraps and all
tests pass. OK for trunk?

2012-03-23   Easwaran Raman  

* cfgbuild.c (non_zero_profile_counts): New function.
(compute_outgoing_frequencies): If at least one successor of a
BB has non-zero profile count, retain the counts.
* expr.c (do_tablejump): Add a REG_BR_PROB note on the
jump to default label.
(try_tablejump): Add a parameter to specify the probability
of jumping to the default label.
* expr.h (try_tablejump): Add a new parameter.
* stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT.
(add_case_node): Pass execution count of the case node and use
it to initialize COUNT field.
(case_probability): New macro.
(expand_case): Propagate execution counts to generated
branches using REG_BR_PROB notes.
(emit_case_nodes): Likewise.
(do_jump_if_equal): Pass probability for REG_BR_PROB note.
(compute_subtree_counts): New function to compute
SUBTREE_COUNT fields of case nodes.
(add_prob_note_to_last_insn): Add a REG_BR_PROB note with the
given probability to the last generated instruction.

gcc/testsuite/ChangeLog:
2012-03-23   Easwaran Raman  
* gcc.dg/tree-prof/switch-case-1.c: New test case.
* gcc.dg/tree-prof/switch-case-2.c: New test case.

diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
index 692fea8..d75fbda 100644
--- a/gcc/cfgbuild.c
+++ b/gcc/cfgbuild.c
@@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb)
 purge_dead_tablejump_edges (bb, table);
 }
 
+/* Check if there is at least one edge in EDGES with a non-zero count
+   field.  */
+
+static bool
+non_zero_profile_counts ( VEC(edge,gc) *edges) {
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE(e, ei, edges)
+{
+  if (e->count > 0)
+return true;
+}
+  return false;
+}
+
 /*  Assume that frequency of basic block B is known.  Compute frequencies
 and probabilities of outgoing edges.  */
 
@@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b)
   e->count = b->count;
   return;
 }
+  else if (non_zero_profile_counts (b->succs)){
+/*Profile counts already set, but REG_NOTE missing. Retain the counts.  */
+return;
+  }
   guess_outgoing_edge_probabilities (b);
   if (b->count)
 FOR_EACH_EDGE (e, ei, b->succs)
diff --git a/gcc/expr.c b/gcc/expr.c
index f9de908..fb8eef9 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode);
 #ifdef PUSH_ROUNDING
 static void emit_single_push_insn (enum machine_mode, rtx, tree);
 #endif
-static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
+static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
 static rtx const_vector_from_tree (tree);
 static void write_complex_part (rtx, rtx, bool);
 
@@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree 
minval, tree range,
TABLE_LABEL is a CODE_LABEL rtx for the table itself.
 
DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the
-   index value is out of range.  */
+   index value is out of range.
+   DEFAULT_PROBABILITY is the probability of jumping to the
+   DEFAULT_LABEL.  */
 
 static void
 do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
- rtx default_label)
+ rtx default_label, int default_probability)
 {
   rtx temp, vector;
 
@@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx 
range, rtx table_label,
  the maximum value of the range.  */
 
   if (default_label)
-emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
-default_label);
+{
+  emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
+  default_label);
+  if (default_probability != -1)
+{
+  rtx jump_insn = get_last_insn();
+  add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability));
+}
+}
+
 
   /* If index is in range, it must fit in Pmode.
  Convert to Pmode so we can index with it.  */
@@ -10758,7 +10768,7 @@ do_tablejump (rtx index, enum machine_mode mode, rtx 
range, rtx table_label,
 
 int
 try_tablejump (tree index_type, tree index_expr, tree minval, tree range,
-  rtx table_label, rtx default_label)
+  rtx table_label, rtx default_label, int default_probability)
 {
   rtx index;
 
@@ -10776,7 +10786,7 @@ try_tablejump (tree index_type, tree index_expr, tree 
minval, tree range,
   TYPE_MODE (TREE_TYPE (range)),
   expand_normal (range),
   TYPE_UNSIGNED (TREE_TYPE (range))),
-   table_label, default_label);
+   table_label, default_label, default_prob

Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Easwaran Raman
Some more background on this patch: Right now, while the execution
counts of different case labels of a switch statement are obtained
during profile collection, they are not propagated to RTL. Instead,
counts are regenerated at the RTL level using static heuristics that
tend to weigh branches equally which can cause poor optimization of
hot code. This patch ensures that the counts collected during profile
collection are correctly propagated allowing hot code to be better
optimized by RTL optimizations.  Patch tested on x86_64.

- Easwaran

On Fri, Mar 23, 2012 at 10:43 AM, Easwaran Raman  wrote:
> This patch propagates execution count of thee case labels of a
> switch-case statement after its expansion. Bootstraps and all
> tests pass. OK for trunk?
>
> 2012-03-23   Easwaran Raman  
>
>        * cfgbuild.c (non_zero_profile_counts): New function.
>        (compute_outgoing_frequencies): If at least one successor of a
>        BB has non-zero profile count, retain the counts.
>        * expr.c (do_tablejump): Add a REG_BR_PROB note on the
>        jump to default label.
>        (try_tablejump): Add a parameter to specify the probability
>        of jumping to the default label.
>        * expr.h (try_tablejump): Add a new parameter.
>        * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT.
>        (add_case_node): Pass execution count of the case node and use
>        it to initialize COUNT field.
>        (case_probability): New macro.
>        (expand_case): Propagate execution counts to generated
>        branches using REG_BR_PROB notes.
>        (emit_case_nodes): Likewise.
>        (do_jump_if_equal): Pass probability for REG_BR_PROB note.
>        (compute_subtree_counts): New function to compute
>        SUBTREE_COUNT fields of case nodes.
>        (add_prob_note_to_last_insn): Add a REG_BR_PROB note with the
>        given probability to the last generated instruction.
>
> gcc/testsuite/ChangeLog:
> 2012-03-23   Easwaran Raman  
>        * gcc.dg/tree-prof/switch-case-1.c: New test case.
>        * gcc.dg/tree-prof/switch-case-2.c: New test case.
>
> diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
> index 692fea8..d75fbda 100644
> --- a/gcc/cfgbuild.c
> +++ b/gcc/cfgbuild.c
> @@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb)
>     purge_dead_tablejump_edges (bb, table);
>  }
>
> +/* Check if there is at least one edge in EDGES with a non-zero count
> +   field.  */
> +
> +static bool
> +non_zero_profile_counts ( VEC(edge,gc) *edges) {
> +  edge e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE(e, ei, edges)
> +    {
> +      if (e->count > 0)
> +        return true;
> +    }
> +  return false;
> +}
> +
>  /*  Assume that frequency of basic block B is known.  Compute frequencies
>     and probabilities of outgoing edges.  */
>
> @@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b)
>       e->count = b->count;
>       return;
>     }
> +  else if (non_zero_profile_counts (b->succs)){
> +    /*Profile counts already set, but REG_NOTE missing. Retain the counts.  
> */
> +    return;
> +  }
>   guess_outgoing_edge_probabilities (b);
>   if (b->count)
>     FOR_EACH_EDGE (e, ei, b->succs)
> diff --git a/gcc/expr.c b/gcc/expr.c
> index f9de908..fb8eef9 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode);
>  #ifdef PUSH_ROUNDING
>  static void emit_single_push_insn (enum machine_mode, rtx, tree);
>  #endif
> -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
> +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
>  static rtx const_vector_from_tree (tree);
>  static void write_complex_part (rtx, rtx, bool);
>
> @@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree 
> minval, tree range,
>    TABLE_LABEL is a CODE_LABEL rtx for the table itself.
>
>    DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the
> -   index value is out of range.  */
> +   index value is out of range.
> +   DEFAULT_PROBABILITY is the probability of jumping to the
> +   DEFAULT_LABEL.  */
>
>  static void
>  do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
> -             rtx default_label)
> +             rtx default_label, int default_probability)
>  {
>   rtx temp, vector;
>
> @@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx 
> range, rtx table_label,
>      the maximum value of the range.  */
>
>   if (default_label)
> -    emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
> -                            default_label);
> +    {
> +      emit_cmp_and_jump_insns (i

Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Easwaran Raman
On Fri, Mar 23, 2012 at 3:29 PM, Andi Kleen  wrote:
> Easwaran Raman  writes:
>
>> Some more background on this patch: Right now, while the execution
>> counts of different case labels of a switch statement are obtained
>> during profile collection, they are not propagated to RTL. Instead,
>> counts are regenerated at the RTL level using static heuristics that
>> tend to weigh branches equally which can cause poor optimization of
>> hot code. This patch ensures that the counts collected during profile
>> collection are correctly propagated allowing hot code to be better
>> optimized by RTL optimizations.  Patch tested on x86_64.
>
> I think your patch doesn't use the probably to weight the decision
> tree for non tablejump, right? I looked at this some time ago,
> but the patch always had problems.

Do you mean use the weights to decide the shape of the binary tree
(similar to COST_TABLE heuristic)? I am planning to send a separate
patch for that. This one just makes sure that the profile counts are
propagated correctly. So you will still have a situation where a
branch corresponding to an infrequently executed case dominates a
frequently executed case, but the BB of the cases gets the right
profile weight.

- Easwaran

> -Andi
>
> --
> a...@linux.intel.com -- Speaking for myself only


Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-25 Thread Easwaran Raman
On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka  wrote:
>> This patch propagates execution count of thee case labels of a
>> switch-case statement after its expansion. Bootstraps and all
>> tests pass. OK for trunk?
>
> Hi,
> while this is resonable thing to do, I belive it would make most sense
> to make switch lowering at gimple level: i.e. reorganize the existing
> code to produce gimple statement and change expansion code to produce
> tablejump from every gimple switch statement that was left in the
> code.
>
> This would be both cleaner and would allow gimple optimizers to improve the
> generated code. Incrementally it would also allow us to improve switch 
> exansion
> that is quite baroque and not realy producing very good code in some common
> cases.
>
> If you would be interested working on this (i.e. reorganizing the expansion
> code to produce gimple), I would be very happy. If not, I can review the
> profile updating part for mainline, since in any case this is desriable thing
> to do.

I am planning to explore improvements to switch expansion (peeling
some cases and using jump tables for the rest, for example) and I
think the reorganization you suggest is the right way to do such
improvements. But until I can spend time on it and get it done, I
would like this patch to get reviewed and checked in.

Thanks,
Easwaran

> Honza


Re: [google]Add support for sampled profile collection (issue4438083)

2012-03-30 Thread Easwaran Raman
I want to revive this patch for mainline and have some questions on
Honza's comments.

On Fri, Apr 29, 2011 at 1:48 PM, Jan Hubicka  wrote:
>>
>> A known limitation is that value profiling is not yet sampled, but it
>> does not seem to cause problems.
>
> For the performance alone, we probably don't need to care that much
> given the fact that we value porfile only relatively expensive operations.
> But if we want to have the turn off/on feature, then i gueess we need to
> guard everything.  It is not much of pain to add the code generating
> conditionals everywhere, after all.

If we sample value profiling instrumentation as well, does it make
sense to use a single counter and rate for all instrumentations. If
not, does the additional complexity (and flags) justify the benefit of
uniformity?

>> > +/* Insert STMT_IF around given sequence of consecutive statements in the
>> > +   same basic block starting with STMT_START, ending with STMT_END.  */
>> > +
>> > +static void
>> > +insert_if_then (gimple stmt_start, gimple stmt_end, gimple stmt_if)
>> > +{
>> > +  gimple_stmt_iterator gsi;
>> > +  basic_block bb_original, bb_before_if, bb_after_if;
>> > +  edge e_if_taken, e_then_join;
>> > +
>> > +  gsi = gsi_for_stmt (stmt_start);
>> > +  gsi_insert_before (&gsi, stmt_if, GSI_SAME_STMT);
>> > +  bb_original = gsi_bb (gsi);
>> > +  e_if_taken = split_block (bb_original, stmt_if);
>> > +  e_if_taken->flags &= ~EDGE_FALLTHRU;
>> > +  e_if_taken->flags |= EDGE_TRUE_VALUE;
>> > +  e_then_join = split_block (e_if_taken->dest, stmt_end);
>> > +  bb_before_if = e_if_taken->src;
>> > +  bb_after_if = e_then_join->dest;
>
> On mainline when we do profile estimation before profiling instrumentaiton, 
> now,
> you really want to update profile for performance here.
I am not sure I understand this.

>> > +  make_edge (bb_before_if, bb_after_if, EDGE_FALSE_VALUE);
>> > +}
>> > +
>> > +/* Transform:
>> > +
>> > +   ORIGINAL CODE
>> > +
>> > +   Into:
>> > +
>> > +   __gcov_sample_counter++;
>> > +   if (__gcov_sample_counter >= __gcov_sampling_rate)
>> > +     {
>> > +       __gcov_sample_counter = 0;
>> > +       ORIGINAL CODE
>> > +     }
>
> Hmm, I think the probability that internal loop of program will interfere with
> sampling rate is relatively high, but I see it is bit hard to do something 
> about
> it.  Can we think of some very basic randomization of sampling_rate?

The predominant use case we have for this technique for server
workloads is as follows: Have a high value for __gcov_sampling_rate
(that should probably be named __gcov_sampling_interval)  during
server start up so that it reduces the overhead as well as ensures
that the startup period doesn't pollute the rest of the counters.
After startup, change the sampling interval to a small value (in many
cases, to 1) using the external interface in libgcov.c. In this use
case, randomization doesn't make sense during startup (since we want
to skip profile collection) as well as the steady phase (since the
interval is 1 or a small number).  If you want to have randomization
support, do you have any suggestions for how to make it work with low
sampling intervals?


Thanks,
Easwaran

> Honza


Re: Propagate profile counts after switch case expansion (issue5896043)

2012-04-09 Thread Easwaran Raman
On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman  wrote:
> On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka  wrote:
>>> This patch propagates execution count of thee case labels of a
>>> switch-case statement after its expansion. Bootstraps and all
>>> tests pass. OK for trunk?
>>
>> Hi,
>> while this is resonable thing to do, I belive it would make most sense
>> to make switch lowering at gimple level: i.e. reorganize the existing
>> code to produce gimple statement and change expansion code to produce
>> tablejump from every gimple switch statement that was left in the
>> code.
>>
>> This would be both cleaner and would allow gimple optimizers to improve the
>> generated code. Incrementally it would also allow us to improve switch 
>> exansion
>> that is quite baroque and not realy producing very good code in some common
>> cases.
>>
>> If you would be interested working on this (i.e. reorganizing the expansion
>> code to produce gimple), I would be very happy. If not, I can review the
>> profile updating part for mainline, since in any case this is desriable thing
>> to do.
>
> I am planning to explore improvements to switch expansion (peeling
> some cases and using jump tables for the rest, for example) and I
> think the reorganization you suggest is the right way to do such
> improvements. But until I can spend time on it and get it done, I
> would like this patch to get reviewed and checked in.
>
> Thanks,
> Easwaran

Ping.


Re: Propagate profile counts after switch case expansion (issue5896043)

2012-04-16 Thread Easwaran Raman
Ping.

On Mon, Apr 9, 2012 at 2:33 PM, Easwaran Raman  wrote:
> On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman  wrote:
>> On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka  wrote:
>>>> This patch propagates execution count of thee case labels of a
>>>> switch-case statement after its expansion. Bootstraps and all
>>>> tests pass. OK for trunk?
>>>
>>> Hi,
>>> while this is resonable thing to do, I belive it would make most sense
>>> to make switch lowering at gimple level: i.e. reorganize the existing
>>> code to produce gimple statement and change expansion code to produce
>>> tablejump from every gimple switch statement that was left in the
>>> code.
>>>
>>> This would be both cleaner and would allow gimple optimizers to improve the
>>> generated code. Incrementally it would also allow us to improve switch 
>>> exansion
>>> that is quite baroque and not realy producing very good code in some common
>>> cases.
>>>
>>> If you would be interested working on this (i.e. reorganizing the expansion
>>> code to produce gimple), I would be very happy. If not, I can review the
>>> profile updating part for mainline, since in any case this is desriable 
>>> thing
>>> to do.
>>
>> I am planning to explore improvements to switch expansion (peeling
>> some cases and using jump tables for the rest, for example) and I
>> think the reorganization you suggest is the right way to do such
>> improvements. But until I can spend time on it and get it done, I
>> would like this patch to get reviewed and checked in.
>>
>> Thanks,
>> Easwaran
>
> Ping.


[google] Add support for delete operator that takes the size of the object as a parameter

2011-12-11 Thread Easwaran Raman
In the tcmalloc memory
allocator(http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html),
deallocation involves a costly lookup to get the size of the deleted
object. The size is required to find the right free list to release
back the object. By passing the  size of the object to the delete
call, this lookup can be avoided. This patch adds support for operator
delete(void*, size_t) guarded by a -fsized-delete flag. It also adds a
default implementation of delete(void *, size_t) that ignores the size
parameter.

Bootstraps and no test regressions. OK for google/gcc-4_6 branch?
---

2011-12-11  Easwaran Raman  

* common.opt (fsized-delete): New option.

cp/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  

* decl.c (cxx_init_decl_processing): Specify a function that
  takes a void* and size_t for DELETE_EXPR.
* call.c (build_op_delete_call): If fsized-delete is used, use
  the variant that takes size_t as the second parameter except
  when deleteting a pointer of type void *.

testsuite/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  

* g++.dg/other/sized-delete-1.C: New test.

libstdc++-v3/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  

* libsupc++/del_op.cc (delete): Define a new variant
  void operator delete(void*, std::size_t).
* libsupc++/new (delete): Declare
  void operator delete(void*, std::size_t) throw();
* testsuite/util/testsuite_abi.cc (check_version): Add new
  known version GLIBCXX_3.4.17.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Add
  new symbol _ZdlPvm@@GLIBCXX_3.4.17.
* config/abi/pre/gnu.ver: Add new symbol _ZdlPvm at version
  GLIBCXX_3.4.17.
Index: libstdc++-v3/libsupc++/del_op.cc
===
--- libstdc++-v3/libsupc++/del_op.cc	(revision 182213)
+++ libstdc++-v3/libsupc++/del_op.cc	(working copy)
@@ -46,3 +46,11 @@ operator delete(void* ptr) throw ()
   if (ptr)
 std::free(ptr);
 }
+
+_GLIBCXX_WEAK_DEFINITION void
+operator delete(void* ptr,
+std::size_t bytes __attribute__((__unused__))) throw ()
+{
+  if (ptr)
+std::free(ptr);
+}
Index: libstdc++-v3/libsupc++/new
===
--- libstdc++-v3/libsupc++/new	(revision 182213)
+++ libstdc++-v3/libsupc++/new	(working copy)
@@ -93,6 +93,7 @@ namespace std
 void* operator new(std::size_t) throw (std::bad_alloc);
 void* operator new[](std::size_t) throw (std::bad_alloc);
 void operator delete(void*) throw();
+void operator delete(void*, std::size_t) throw();
 void operator delete[](void*) throw();
 void* operator new(std::size_t, const std::nothrow_t&) throw();
 void* operator new[](std::size_t, const std::nothrow_t&) throw();
Index: libstdc++-v3/testsuite/util/testsuite_abi.cc
===
--- libstdc++-v3/testsuite/util/testsuite_abi.cc	(revision 182213)
+++ libstdc++-v3/testsuite/util/testsuite_abi.cc	(working copy)
@@ -194,6 +194,7 @@ check_version(symbol& test, bool added)
   known_versions.push_back("GLIBCXX_3.4.14");
   known_versions.push_back("GLIBCXX_3.4.15");
   known_versions.push_back("GLIBCXX_3.4.16");
+  known_versions.push_back("GLIBCXX_3.4.17");
   known_versions.push_back("GLIBCXX_LDBL_3.4");
   known_versions.push_back("GLIBCXX_LDBL_3.4.7");
   known_versions.push_back("GLIBCXX_LDBL_3.4.10");
@@ -560,4 +561,3 @@ demangle(const std::string& mangled)
 }
   return name;
 }
-
Index: libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt
===
--- libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt	(revision 182213)
+++ libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt	(working copy)
@@ -2422,6 +2422,7 @@ FUNC:_ZTv0_n24_NSt9strstreamD1Ev@@GLIBCXX_3.4
 FUNC:_ZdaPv@@GLIBCXX_3.4
 FUNC:_ZdaPvRKSt9nothrow_t@@GLIBCXX_3.4
 FUNC:_ZdlPv@@GLIBCXX_3.4
+FUNC:_ZdlPvm@@GLIBCXX_3.4.17
 FUNC:_ZdlPvRKSt9nothrow_t@@GLIBCXX_3.4
 FUNC:_Znam@@GLIBCXX_3.4
 FUNC:_ZnamRKSt9nothrow_t@@GLIBCXX_3.4
Index: libstdc++-v3/config/abi/pre/gnu.ver
===
--- libstdc++-v3/config/abi/pre/gnu.ver	(revision 182213)
+++ libstdc++-v3/config/abi/pre/gnu.ver	(working copy)
@@ -1279,6 +1279,13 @@ GLIBCXX_3.4.16 {
 
 } GLIBCXX_3.4.15;
 
+GLIBCXX_3.4.17 {
+
+# operator delete(void*, , unsigned long)
+_ZdlPvm;
+
+} GLIBCXX_3.4.16;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
Index: gcc/testsuite/g++.dg/other/sized-delete-1.C
===
--- gcc/testsuite/g++.dg/other/sized-delete-1.C	(revision 0)
+++ gcc/testsuite/

Re: [google] Add support for delete operator that takes the size of the object as a parameter

2011-12-12 Thread Easwaran Raman
Thanks for the comments Paolo. I have attached the new patch. I have
bumped the version to 3.4.18 and used _ZdlPv[jmy] in gnu.ver.  I have
also added the symbol to baseline_symbols.txt of other targets.

-Easwaran


2011-12-11  Easwaran Raman  

* common.opt (fsized-delete): New option.

cp/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  

* decl.c (cxx_init_decl_processing): Specify a function that
  takes a void* and size_t for DELETE_EXPR.
* call.c (build_op_delete_call): If fsized-delete is used, use
  the variant that takes size_t as the second parameter except
  when deleteting a pointer of type void *.

testsuite/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  

* g++.dg/other/sized-delete-1.C: New test.

libstdc++-v3/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  

* libsupc++/del_op.cc (delete): Define a new variant
  void operator delete(void*, std::size_t).
* libsupc++/new (delete): Declare
  void operator delete(void*, std::size_t) throw();
* testsuite/util/testsuite_abi.cc (check_version): Add new
  known version GLIBCXX_3.4.18.
* config/abi/post/i386-linux-gnu/baseline_symbols.txt: Add
  new symbol _ZdlPvm@@GLIBCXX_3.4.18.
* config/abi/post/sparc-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/mips64-linux-gnu/64/baseline_symbols.txt: Likewise.
* config/abi/post/mips64-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/alpha-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/s390-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt: Likewise.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/mips-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt: Likewise.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt: Likewise.
* config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.10/amd64/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.10/sparcv9/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.10/baseline_symbols.txt: Likewise.
* config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/hppa-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.8/sparcv9/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.8/baseline_symbols.txt: Likewise.
* config/abi/pre/gnu.ver: Add new symbol _ZdlPv[jmy] at version
  GLIBCXX_3.4.18.

On Mon, Dec 12, 2011 at 5:25 AM, Paolo Carlini  wrote:
> On 12/12/2011 02:21 PM, Diego Novillo wrote:
>>
>> Ah, right. I missed the ABI implications.
>
> For possible inclusion in mainline too, things don't seem completely ok:
> nothing should be added to the baseline and very likely the export should be
> adjusted to accommodate for different size_t on the various targets, by
> using [] in the pattern. See, eg, the existing operator new[](size_t).
>
> Paolo.
Index: libstdc++-v3/libsupc++/del_op.cc
===
--- libstdc++-v3/libsupc++/del_op.cc	(revision 182251)
+++ libstdc++-v3/libsupc++/del_op.cc	(working copy)
@@ -46,3 +46,11 @@ operator delete(void* ptr) throw ()
   if (ptr)
 std::free(ptr);
 }
+
+_GLIBCXX_WEAK_DEFINITION void
+operator delete(void* ptr,
+std::size_t bytes __attribute__((__unused__))) throw ()
+{
+  if (ptr)
+std::free(ptr);
+}
Index: libstdc++-v3/libsupc++/new
===
--- libstdc++-v3/libsupc++/new	(revision 182251)
+++ libstdc++-v3/libsupc++/new	(working copy)
@@ -93,6 +93,7 @@ namespace std
 void* operator new(std::size_t) throw (std::bad_alloc);
 void* operator new[](std::size_t) throw (std::bad_alloc);
 void operator delete(void*) throw();
+void operator delete(void*, std::size_t) throw();
 void operator delete[](void*) throw();
 void* operator new(std::size_t, const std::nothrow_t&) throw();
 void* operator new[](std::size_t, const std::nothrow_t&) throw();
Index: libstdc++-v3/testsuite/util/testsuite_abi.cc
===
--- libstdc++-v3/testsuite/util/testsuite_abi.cc	(revision 182251)
+++ libstdc++-v3/testsuite/util/testsuite_abi.cc	(working copy)
@@ -195,6 +195,7 @@ check_version(symbol& test, bool added)
   known_versions.push_back("GLIBCXX_3.4.15");
   known_versions.push_back(&qu

Re: [google] Add support for delete operator that takes the size of the object as a parameter

2011-12-12 Thread Easwaran Raman
On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini
 wrote:
> On 12/12/2011 09:37 PM, Easwaran Raman wrote:
>>
>> Thanks for the comments Paolo. I have attached the new patch. I have
>> bumped the version to 3.4.18
>
> You shouldn't: 4.7 is not out yet, thus no reason to increase the minor
> version beyond the current 17.
Ok, I then don't understand your comment
 "Note that backporting the patch as-is to 4_6-branch would be very
wrong in terms of ABI (in mainline we already have a 3.4.17)".
My original patch added the new symbol in version 3.4.17. Since we
don't want to add the symbol to 3.4.16 (if we have a situation where
the new runtime is not available when running a program compiled with
-fsized-delete) and you   said I shouldn't be using 3.4.17, I assumed
I had to bump up the version.

>
>>  and used _ZdlPv[jmy] in gnu.ver.  I have
>> also added the symbol to baseline_symbols.txt of other targets.
>
> You should not, just read again what I wrote. And you don't have to believe
> me: just browse the libstdc++ ChangeLogs and see if somebody ever does that
> when the linker map is touched.

Sorry, I again misunderstood this as well (and still don't have a good
picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? I added
that by mimicking the symbol  _Znw[jmy] found in the same file. From
the log, it looks like the baseline_symbols.txt seems to be generated,
but I am not sure how that is to be done. For example, r145437 says a
bunch of these baseline_symbols.txt are regenerated, but I don't see
any other change from which this might be generated.

Thanks,
Easwaran


> Paolo.


Re: [google] Add support for delete operator that takes the size of the object as a parameter

2011-12-13 Thread Easwaran Raman
On Mon, Dec 12, 2011 at 1:52 PM, Easwaran Raman  wrote:
> On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini
>  wrote:
>> On 12/12/2011 09:37 PM, Easwaran Raman wrote:
>>>
>>> Thanks for the comments Paolo. I have attached the new patch. I have
>>> bumped the version to 3.4.18
>>
>> You shouldn't: 4.7 is not out yet, thus no reason to increase the minor
>> version beyond the current 17.
> Ok, I then don't understand your comment
>  "Note that backporting the patch as-is to 4_6-branch would be very
> wrong in terms of ABI (in mainline we already have a 3.4.17)".
> My original patch added the new symbol in version 3.4.17. Since we
> don't want to add the symbol to 3.4.16 (if we have a situation where
> the new runtime is not available when running a program compiled with
> -fsized-delete) and you   said I shouldn't be using 3.4.17, I assumed
> I had to bump up the version.
>
>>
>>>  and used _ZdlPv[jmy] in gnu.ver.  I have
>>> also added the symbol to baseline_symbols.txt of other targets.
>>
>> You should not, just read again what I wrote. And you don't have to believe
>> me: just browse the libstdc++ ChangeLogs and see if somebody ever does that
>> when the linker map is touched.
>
> Sorry, I again misunderstood this as well (and still don't have a good
> picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? I added
> that by mimicking the symbol  _Znw[jmy] found in the same file. From
> the log, it looks like the baseline_symbols.txt seems to be generated,
> but I am not sure how that is to be done. For example, r145437 says a
> bunch of these baseline_symbols.txt are regenerated, but I don't see
> any other change from which this might be generated.
>
> Thanks,
> Easwaran

It looks like running 'make new-abi-baseline' under
TARGET/libstdc++-v3/testsuite generates the baseline file. Should I
check in that? What about other targets?

Thanks,
Easwaran

>
>> Paolo.


Re: [google] Add support for delete operator that takes the size of the object as a parameter

2011-12-17 Thread Easwaran Raman
Based on Paolo's comments I am dropping the changes to
baseline_symbols.txt. As far as minor version, I am bumping it up to
18. Assuming that this patch will be able to go in gcc 4.8 (with minor
version 18), I want to keep it at the same version in google/main and
google/gcc-4_6.

Bootstraps and no test regression. OK for google/main and
google/gcc-4_6 branches?

-
011-12-17  Easwaran Raman  

* common.opt (fsized-delete): New option.

cp/ChangeLog.google-main:

2011-12-17  Easwaran Raman  

* decl.c (cxx_init_decl_processing): Specify a function that
  takes a void* and size_t for DELETE_EXPR.
* call.c (build_op_delete_call): If fsized-delete is used, use
  the variant that takes size_t as the second parameter except
  when deleteting a pointer of type void *.

testsuite/ChangeLog.google-main:

2011-12-17  Easwaran Raman  

* g++.dg/other/sized-delete-1.C: New test.

libstdc++-v3/ChangeLog.google-main:

2011-12-17  Easwaran Raman  

* libsupc++/del_op.cc (delete): Define a new variant
  void operator delete(void*, std::size_t).
* libsupc++/new (delete): Declare
  void operator delete(void*, std::size_t) throw();
* testsuite/util/testsuite_abi.cc (check_version): Add new
  known version GLIBCXX_3.4.18.
* config/abi/pre/gnu.ver: Add new symbol _ZdlPv[jmy] at version
  GLIBCXX_3.4.18.

On Wed, Dec 14, 2011 at 3:31 AM, Paolo Carlini  wrote:
> Hi,
>
>> On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini
>>   wrote:
>>>
>>> On 12/12/2011 09:37 PM, Easwaran Raman wrote:
>>>>
>>>> Thanks for the comments Paolo. I have attached the new patch. I have
>>>> bumped the version to 3.4.18
>>>
>>> You shouldn't: 4.7 is not out yet, thus no reason to increase the minor
>>> version beyond the current 17.
>>
>> Ok, I then don't understand your comment
>>  "Note that backporting the patch as-is to 4_6-branch would be very
>> wrong in terms of ABI (in mainline we already have a 3.4.17)".
>> My original patch added the new symbol in version 3.4.17. Since we
>> don't want to add the symbol to 3.4.16 (if we have a situation where
>> the new runtime is not available when running a program compiled with
>> -fsized-delete) and you   said I shouldn't be using 3.4.17, I assumed
>> I had to bump up the version.
>
> Note this is going to be an academic discussion, because it's too late for
> 4.7 anyway, and I'm not even sure it's conforming to add overloads like this
> for operators.
>
> Anyway.
>
> For 4.6 branch at this point the situation would be very difficult. We could
> have a small time window before 4.7 is out to move **all** its new symbols
> vs 4.6, currently in 3.4.17,  to a new 3.4.18 and then we could bump 4.6 to
> 3.4.17. You see, after a minor is delivered you cannot add anything to it,
> and also you have the general requirement that the minor version is the
> minor ersion, thus irrespective whether we are talking about gcc4.6 or
> gcc4.7, a specific minor version number defines which symbols are exported.
> I hope now the issues are more clear.You must be always very careful.
>
>>>>  and used _ZdlPv[jmy] in gnu.ver.  I have
>>>> also added the symbol to baseline_symbols.txt of other targets.
>>>
>>> You should not, just read again what I wrote. And you don't have to
>>> believe
>>> me: just browse the libstdc++ ChangeLogs and see if somebody ever does
>>> that
>>> when the linker map is touched.
>>
>> Sorry, I again misunderstood this as well (and still don't have a good
>> picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok?
>
> Looks fine yes. But I didn't really check the letters. General rule of thumb
> when fiddling with linker scripts: double check what you are doing with a
> multilib system, like x86_64, at least you can catch errors when you are
> mistakenly not accounting for the 32-bit version of the symbols. In the case
> at issue, size_t can boil down be unsigned int, unsigned long, unsigned long
> long, make sure the pattern includes all three.
>
>> I added
>> that by mimicking the symbol  _Znw[jmy] found in the same file. From
>> the log, it looks like the baseline_symbols.txt seems to be generated,
>> but I am not sure how that is to be done. For example, r145437 says a
>> bunch of these baseline_symbols.txt are regenerated, but I don't see
>> any other change from which this might be generated.
>
> General rule of thumb: normally, if you aren't a release manager, or a
> maintainer, **never** regenerate the baseline_symbols files. Just

[google] Propagate profile information to RTL level during switch expansion

2012-01-31 Thread Easwaran Raman
This patch propagates profile information to RTL level when expanding
switch statements using jump table or a binary tree of branches.

Ok for google/gcc-4_6 branch? I  would like the patch to be considered
for trunk when stage 1 opens again.

-Easwaran

2012-01-31  Easwaran Raman  

* expr.c (do_tablejump): Add default_probability as the
sixth parameter and use it to generate REG_BR_PROB note.
(try_tablejump): Add default_probability as a parameter.
* cfgbuild.c (non_zero_profile_counts): New function.
(compute_outgoing_frequencies): If edges have profile counts
set, don't replace them with guessed values.
* expr.h (try_tablejump): Add additional parameter to the
declaration.
* stmt.c (tree-flow.h): Include.
(case_node): Add new fields count and subtree_count.
(add_case_node): Pass count as a paramater and initialize
count field of case_node.
(case_probability): New macro.
(expand_case): Propagate profile information from the tree
level to RTL during switch case expansion.
(balance_case_nodes): Compute subtree_count for case nodes.
(emit_case_nodes): Set branch probabilities for generated
branches.

testsuite/ChengeLog.google-4_6:

2012-01-31  Easwaran Raman  

* tree-prof/switch-case-1.c: New test.
* tree-prof/switch-case-2.c: New test.
Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c
===
--- gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c	(revision 0)
@@ -0,0 +1,40 @@
+/* { dg-options "-O2 -fdump-rtl-expand-blocks -fdump-tree-optimized-blocks" } */
+int g;
+
+__attribute__((noinline)) void foo (int  n)
+{
+  switch (n)
+{
+case 1:
+  g++; break;
+case 2:
+  g += 2; break;
+case 3:
+  g += 1; break;
+case 4:
+  g += 3; break;
+case 5:
+  g += 4; break;
+case 6:
+  g += 5; break;
+case 7:
+  g += 6; break;
+case 8:
+  g += 7; break;
+case 9:
+  g += 8; break;
+default:
+  g += 8; break;
+   }
+}
+
+int main ()
+{
+ int i;
+ for (i = 0; i < 1; i++)
+   foo ((i * i) % 5);
+ return 0;
+}
+/* { dg-final-use { scan-rtl-dump-times "Succ edge\[^\\n\]*count:4000" 4 "expand"} } */
+/* { dg-final-use { scan-rtl-dump-times "Succ edge\[^\\n\]*count:2000" 1 "expand"} } */
+/* { dg-final-use { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
===
--- gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c	(revision 0)
@@ -0,0 +1,40 @@
+/* { dg-options "-O2 -fdump-rtl-expand-blocks" } */
+int g;
+
+__attribute__((noinline)) void foo (int  n)
+{
+  switch (n)
+{
+case 99:
+  g += 2; break;
+case 1:
+  g++; break;
+case 100:
+  g += 1; break;
+case 4:
+  g += 3; break;
+case 5:
+  g += 4; break;
+case 6:
+  g += 5; break;
+case 7:
+  g += 6; break;
+case 8:
+  g += 7; break;
+case 9:
+  g += 8; break;
+default:
+  g += 8; break;
+   }
+}
+
+int main ()
+{
+ int i;
+ for (i = 0; i < 1; i++)
+   foo ((i * i) % 5);
+ return 0;
+}
+/* { dg-final-use { scan-rtl-dump-times "Succ edge\[^\\n\]*count:4000" 4 "expand"} } */
+/* { dg-final-use { scan-rtl-dump-times "Succ edge\[^\\n\]*count:2000" 1 "expand"} } */
+/* { dg-final-use { cleanup-rtl-dump "expand" } } */
Index: gcc/expr.c
===
--- gcc/expr.c	(revision 183262)
+++ gcc/expr.c	(working copy)
@@ -155,7 +155,7 @@ static rtx do_store_flag (sepops, rtx, enum machin
 #ifdef PUSH_ROUNDING
 static void emit_single_push_insn (enum machine_mode, rtx, tree);
 #endif
-static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
+static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
 static rtx const_vector_from_tree (tree);
 static void write_complex_part (rtx, rtx, bool);
 
@@ -10245,7 +10245,7 @@ try_casesi (tree index_type, tree index_expr, tree
 
 static void
 do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
-	  rtx default_label)
+	  rtx default_label, int default_probability)
 {
   rtx temp, vector;
 
@@ -10261,9 +10261,17 @@ do_tablejump (rtx index, enum machine_mode mode, r
  the maximum value of the range.  */
 
   if (default_label)
-emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
-			 default_label);
+{
+  emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
+			   default_label);
+  if (default_probability != -1)
+{
+  rtx jump_insn = get_last_

Re: [google] Propagate profile information to RTL level during switch expansion

2012-02-01 Thread Easwaran Raman
On Tue, Jan 31, 2012 at 10:16 PM, Xinliang David Li  wrote:
> Ok for google branch with minor changes below.
>
> thanks,
>
> David
>
>>> +#define case_probability(x, y) ((y) ? ((x) * REG_BR_PROB_BASE  / (y))  : 
>>> -1)
>>> +
>
> Using upper case for macro?
From http://gcc.gnu.org/codingconventions.html,

"Macros names should be in ALL_CAPS when it's important to be aware
that it's a macro (e.g. accessors and simple predicates), but in
lowercase (e.g., size_int) where the macro is a wrapper for efficiency
that should be considered as a function"

In this case case_probability is a wrapper for efficiency and hence I
decided to go with the lower case name.


>>> +                  count -= default_count;
>>> +                  default_count = 0;
>
> Why resetting it to 0?

If there are no gaps, then the indirect jump can't jump to the default
label. Later I set
  default_edge->count = default_count;
and this value should be 0 if there are no gaps.

I have addressed the rest of your comments.

-Easwaran


>
>
>>> +
>>> +static void
>>> +add_prob_note_to_last_insn(int probability)
>>> +{
>
> Missing document.


Re: Ping: Re: Improve DSE in the presence of calls

2011-06-02 Thread Easwaran Raman
Ping.

On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman  wrote:
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html
>


Re: Ping: Re: Improve DSE in the presence of calls

2011-06-07 Thread Easwaran Raman
Ping.

Diego, David,
 Is this patch OK for google/main?

-Easwaran

On Thu, Jun 2, 2011 at 4:48 PM, Easwaran Raman  wrote:
> Ping.
>
> On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman  wrote:
>> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html
>>
>


Re: Improve DSE in the presence of calls

2011-06-15 Thread Easwaran Raman
On Tue, Jun 14, 2011 at 5:47 PM, H.J. Lu  wrote:
> On Tue, May 10, 2011 at 12:18 PM, Easwaran Raman  wrote:
>> On Tue, May 3, 2011 at 9:40 AM, Easwaran Raman  wrote:
>>> On Mon, May 2, 2011 at 8:37 PM, Jeff Law  wrote:
>>>> -BEGIN PGP SIGNED MESSAGE-
>>>> Hash: SHA1
>>>>
>>>> On 04/26/11 16:06, Easwaran Raman wrote:
>>>>
>>>>>
>>>>>> You're right. The patch has correctness issues. It is not possible to
>>>>>> simply not call add_wild_read because it also resets
>>>>>> active_local_stores and frees read_recs. During the local phase it
>>>>>> seems better to just treat calls as wild reads and reset
>>>>>> active_local_stores and free read_recs. I've now refactored
>>>>>> add_wild_read so that resetting active_local_stores and free read_recs
>>>>>> are in separate methods that can be called on non-const/non-memset
>>>>>> calls. In addition, I have added a non_frame_wild_read field in
>>>>>> insn_info to mark non-const and non-memset calls.
>>>>>
>>>>>> I've attached the revised patch.
>>>> Looking better.  Just a few more things.
>>>>
>>>> Don't all locals escape if the callee has a static chain?  Is that
>>>> handled anywhere?
>>>>
>>>> Don't you still have the potential for wild reads in dse_step5_nospill
>>>> (say from an asm)?  if so, then the change can't be correct.
>>>>
>>>> Couldn't you just add a clause like this before the else-if?
>>>>
>>>> else if (insn_info->non_frame_wild_read)
>>>>  {
>>>>    if (dump_file)
>>>>      fprintf (dump_file, "non-frame wild read\n");
>>>>    scan_reads_nospill (insn_info, v, NULL);
>>>>  }
>>>> else if (insn_info->read_rec)
>>>>  {
>>>>    /* Leave this clause unchanged */
>>>>  }
>>>>
>>>> Am I missing something?
>>>
>>> I am not sure I understand the problem here.  If there is a wild read
>>> from asm, the instruction has the wild_read flag set. The if statement
>>> checks if that flag is set and if so it clears the bitmap - which was
>>> the original behavior. Originally, only if read_rec is non NULL you
>>> need to recompute the kill set. Now, even if read_rec is NULL,
>>> non_frame_wild_read could be set requiring the kill set to be
>>> modified, which is what this patch does.  In fact, isn't what you have
>>> written above the equivalent to what is in the patch as '/* Leave this
>>> clause unchanged */' is the same as
>>>
>>>                  if (dump_file)
>>>                    fprintf (dump_file, "regular read\n");
>>>                  scan_reads_nospill (insn_info, v, NULL);
>>>
>>>
>>> -Easwaran
>>>
>>
>> Ping.  I have changed the test case to use int and added another test
>> case that shows DSE doesn't happen when  the struct instance is
>> volatile (wild_read gets set in that case)
>>
>>
>
> One test failed on Linux/ia32:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49414
>
> --
> H.J.
>

Sorry about the ia32 test failure.  Looking into it now.

-Easwaran


Re: Improve DSE in the presence of calls

2011-06-17 Thread Easwaran Raman
This patch seems to break ia64 and some other targets. I have updated
Bug 49429 with a test case triage. It looks like for some EXPR,
may_be_aliased returns incorrect value and hence can_escape
incorrectly concludes the variable can't escape resulting in removal
of useful stores.


(So can_escape() returns false. But later on, in the same BB, I see:

In the following list of instructions,

(insn 4 3 6 2 (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64])
(reg:DI 112 in0)) y.c:23 5 {movdi_internal}
 (expr_list:REG_DEAD (reg:DI 112 in0)
(nil)))

...

(insn 36 30 37 2 (set (reg:DI 120 out0)
(reg/f:DI 357)) 5 {movdi_internal}
 (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp)
(const_int 62 [0x3e]))
(nil)))
(insn 37 36 38 2 (set (reg:DI 121 out1)
(reg/f:DI 341)) 5 {movdi_internal}
 (expr_list:REG_DEAD (reg/f:DI 341)
(expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp)
(const_int 96 [0x60]))
(nil
(insn 38 37 39 2 (set (reg:DI 122 out2)
(const_int 31 [0x1f])) 5 {movdi_internal}
 (nil))
(call_insn 39 38 42 2 (parallel [
(set (reg:DI 8 r8)
(call (mem:DI (symbol_ref:DI ("memcpy") [flags 0x41]
) [0 memcpy S8 A64])
(const_int 1 [0x1])))
(clobber (reg:DI 320 b0))
(clobber (scratch:DI))
(clobber (scratch:DI))
]) 332 {call_value_gp}
 (expr_list:REG_DEAD (reg:DI 122 out2)
(expr_list:REG_DEAD (reg:DI 121 out1)
(expr_list:REG_DEAD (reg:DI 120 out0)
(expr_list:REG_UNUSED (reg:DI 8 r8)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil))
(expr_list:REG_DEP_TRUE (use (reg:DI 1 r1))
(expr_list:REG_DEP_TRUE (use (reg:DI 122 out2))
(expr_list:REG_DEP_TRUE (use (reg:DI 121 out1))
(expr_list:REG_DEP_TRUE (use (reg:DI 120 out0))
(nil))


for the memory expression

(set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64])
(reg:DI 112 in0))

may_be_aliased() returns false, but register 341 is passed as the
second parameter to memcpy. I suspect this is a bug elsewhere which is
exposed by this patch. If someone knows why this might be happening, I
can tighten the  can_escape() function appropriately.

Thanks,
Easwaran



On Tue, Jun 14, 2011 at 9:34 AM, Jeff Law  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 05/10/11 13:18, Easwaran Raman wrote:
>
>>>> I am not sure I understand the problem here.  If there is a wild read
>>>> from asm, the instruction has the wild_read flag set. The if statement
>>>> checks if that flag is set and if so it clears the bitmap - which was
>>>> the original behavior. Originally, only if read_rec is non NULL you
>>>> need to recompute the kill set. Now, even if read_rec is NULL,
>>>> non_frame_wild_read could be set requiring the kill set to be
>>>> modified, which is what this patch does.  In fact, isn't what you have
>>>> written above the equivalent to what is in the patch as '/* Leave this
>>>> clause unchanged */' is the same as
>>>>
>>>>                  if (dump_file)
>>>>                    fprintf (dump_file, "regular read\n");
>>>>                  scan_reads_nospill (insn_info, v, NULL);
>>>>
>>>>
>>>> -Easwaran
>>>>
>>
>>> Ping.  I have changed the test case to use int and added another test
>>> case that shows DSE doesn't happen when  the struct instance is
>>> volatile (wild_read gets set in that case)
>>
>>
>> What's the purpose behind using unit64_t in the testcase?  Somehow I
>> suspect using int64_t means the test is unlikely not going to work
>> across targets with different word sizes.
> Sorry for the exceedingly long wait.  Things have been a bit crazy the
> last several weeks.
>
> On a positive note, re-reading things now I think my objection/comment
> was mis-guided.
>
> Patch approved, and again, sorry for the absurdly long period of
> non-responsiveness.
>
> jeff
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJN942IAAoJEBRtltQi2kC7Fj4IAIUvXsEKHZEKHS2k/psJWyaM
> Uo/vW3CLydRP0+Np/VVSwzHlmWfdWmOj1WPw1Svhvr4gP8BrZ13okVv5jbw1Hh3Y
> R4mShXFK5eYzmGx5wL54hOze5zViN3gomNGbDAAhk6TCzNXmPyLT/6V1tLFTNhD5
> 6zOiW8pXh9ik6qTTCKbG0EMuJXDnIbYrJs4d/gHFerUgmRPc8adKjF3PCngD3F4r
> 40n9W/UxUejYUddavDW1fIdALWYc56F3glplFsII7SMnOmih8MTFYOvk6SZsLS5O
> G2nzmnUuwt6tPWTyk9bpVKQi5dn8MmLkM13w22t36GKIg6OER2KfUdv44dgE7yw=
> =o7AI
> -END PGP SIGNATURE-
>


Re: Improve DSE in the presence of calls

2011-06-18 Thread Easwaran Raman
The gimple for test2_31 before RTL expansion contains:

  # .MEMD.2034_2 = VDEF <.MEMD.2034_1(D)>
  s1D.2035 = s1D.1255;

The RHS is a PARM_DECL. It doesn't have TREE_ADDRESSABLE and the LHS
has, which makes sense. But then the move is translated to memcpy but
the src memory location is still considered not addressable. Setting
TREE_ADDRESSABLE to RHS should fix this bug. I don't know which is the
best place to that though. At the point when it decides to use memcpy
to do the copying in emit_block_move_hints, we are dealing with
temporaries and we don't have the src TREE lying around. Is passing
the tree EXP to emit_block_move from store_expr a reasonable approach?

-Easwaran


On Fri, Jun 17, 2011 at 2:16 PM, Easwaran Raman  wrote:
> This patch seems to break ia64 and some other targets. I have updated
> Bug 49429 with a test case triage. It looks like for some EXPR,
> may_be_aliased returns incorrect value and hence can_escape
> incorrectly concludes the variable can't escape resulting in removal
> of useful stores.
>
>
> (So can_escape() returns false. But later on, in the same BB, I see:
>
> In the following list of instructions,
>
> (insn 4 3 6 2 (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64])
>        (reg:DI 112 in0)) y.c:23 5 {movdi_internal}
>     (expr_list:REG_DEAD (reg:DI 112 in0)
>        (nil)))
>
> ...
>
> (insn 36 30 37 2 (set (reg:DI 120 out0)
>        (reg/f:DI 357)) 5 {movdi_internal}
>     (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp)
>            (const_int 62 [0x3e]))
>        (nil)))
> (insn 37 36 38 2 (set (reg:DI 121 out1)
>        (reg/f:DI 341)) 5 {movdi_internal}
>     (expr_list:REG_DEAD (reg/f:DI 341)
>        (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp)
>                (const_int 96 [0x60]))
>            (nil
> (insn 38 37 39 2 (set (reg:DI 122 out2)
>        (const_int 31 [0x1f])) 5 {movdi_internal}
>     (nil))
> (call_insn 39 38 42 2 (parallel [
>            (set (reg:DI 8 r8)
>                (call (mem:DI (symbol_ref:DI ("memcpy") [flags 0x41]
> ) [0 memcpy S8 A64])
>                    (const_int 1 [0x1])))
>            (clobber (reg:DI 320 b0))
>            (clobber (scratch:DI))
>            (clobber (scratch:DI))
>        ]) 332 {call_value_gp}
>     (expr_list:REG_DEAD (reg:DI 122 out2)
>        (expr_list:REG_DEAD (reg:DI 121 out1)
>            (expr_list:REG_DEAD (reg:DI 120 out0)
>                (expr_list:REG_UNUSED (reg:DI 8 r8)
>                    (expr_list:REG_EH_REGION (const_int 0 [0])
>                        (nil))
>    (expr_list:REG_DEP_TRUE (use (reg:DI 1 r1))
>        (expr_list:REG_DEP_TRUE (use (reg:DI 122 out2))
>            (expr_list:REG_DEP_TRUE (use (reg:DI 121 out1))
>                (expr_list:REG_DEP_TRUE (use (reg:DI 120 out0))
>                    (nil))
>
>
> for the memory expression
>
> (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64])
>    (reg:DI 112 in0))
>
> may_be_aliased() returns false, but register 341 is passed as the
> second parameter to memcpy. I suspect this is a bug elsewhere which is
> exposed by this patch. If someone knows why this might be happening, I
> can tighten the  can_escape() function appropriately.
>
> Thanks,
> Easwaran
>
>
>
> On Tue, Jun 14, 2011 at 9:34 AM, Jeff Law  wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> On 05/10/11 13:18, Easwaran Raman wrote:
>>
>>>>> I am not sure I understand the problem here.  If there is a wild read
>>>>> from asm, the instruction has the wild_read flag set. The if statement
>>>>> checks if that flag is set and if so it clears the bitmap - which was
>>>>> the original behavior. Originally, only if read_rec is non NULL you
>>>>> need to recompute the kill set. Now, even if read_rec is NULL,
>>>>> non_frame_wild_read could be set requiring the kill set to be
>>>>> modified, which is what this patch does.  In fact, isn't what you have
>>>>> written above the equivalent to what is in the patch as '/* Leave this
>>>>> clause unchanged */' is the same as
>>>>>
>>>>>                  if (dump_file)
>>>>>                    fprintf (dump_file, "regular read\n");
>>>>>                  scan_reads_nospill (insn_info, v, NULL);
>>>>>
>>>>>
>>>>> -Easwaran
>>>>>
>>>
>>>> Ping.  I have changed the test case to use int and added another test
>>>> case that shows DSE doesn't happen when  the struct instance is
>>>> volatile (wild_read gets set in that ca

Mark variables addressable if they are copied using libcall in RTL expander

2011-06-20 Thread Easwaran Raman
This fixes bugs introduced by r175063. OK for trunk if there are no
test regressions?

-Easwaran


2011-06-20  Easwaran Raman  

PR rtl-optimization/49429
* expr.c (emit_block_move_hints):  Mark MEM_EXPR(x) and
MEM_EXPR(y) addressable if emit_block_move_via_libcall is
used to copy y into x.

Index: gcc/expr.c
===
--- gcc/expr.c  (revision 175081)
+++ gcc/expr.c  (working copy)
@@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu
   else if (may_use_call
   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
-retval = emit_block_move_via_libcall (x, y, size,
- method == BLOCK_OP_TAILCALL);
+{
+  /* Since x and y are passed to a libcall, mark the corresponding
+ tree EXPR as addressable.  */
+  tree y_expr = MEM_EXPR (y);
+  tree x_expr = MEM_EXPR (x);
+  if (y_expr)
+mark_addressable (y_expr);
+  if (x_expr)
+mark_addressable (x_expr);
+  retval = emit_block_move_via_libcall (x, y, size,
+   method == BLOCK_OP_TAILCALL);
+}
+
   else
 emit_block_move_via_loop (x, y, size, align);


Re: Mark variables addressable if they are copied using libcall in RTL expander

2011-06-22 Thread Easwaran Raman
On Wed, Jun 22, 2011 at 7:13 AM, Eric Botcazou  wrote:
>> I fear this isn't enough considering pass-by-value aggregates that
>> are callee copied.
>
> It's indeed not sufficient for arguments passed by reference but 
> callee-copied.
>
> This is PR target/49454.  For gcc.c-torture/execute/2717-1.c:
>
> typedef struct trio { int a, b, c; } trio;
>
> int
> foo (trio t, int i)
> {
>  return bar (i, t);
> }
>
> yiedls in the .optimized dump:
>
> foo (struct trio t, int i)
> {
>  int D.1968;
>  struct trio t.0;
>
> :
>  t.0 = t;
>  D.1968_2 = bar (i_1(D), t.0);
>  return D.1968_2;
> }
>
> and the aggregate copy is elided by DSE because t.0 isn't may_be_aliased.  
> This
> seems to be a pre-existing bug though: its address is passed to bar in RTL.
>
> --
> Eric Botcazou
>

Is the following patch a reasonable fix for this case?  I assume I
should add similar code inside emit_library_call_value_1.

-Easwaran


--- gcc/calls.c (revision 175081)
+++ gcc/calls.c (working copy)
@@ -1073,6 +1073,8 @@ initialize_argument_information (int num_actuals A
  callee_copies
= reference_callee_copied (args_so_far, TYPE_MODE (type),
   type, argpos < n_named_args);
+  if (callee_copies)
+mark_addressable (args[i].tree_value);

  /* If we're compiling a thunk, pass through invisible references
 instead of making a copy.  */


Re: Mark variables addressable if they are copied using libcall in RTL expander

2011-06-23 Thread Easwaran Raman
On Thu, Jun 23, 2011 at 3:22 AM, Eric Botcazou  wrote:
>> So, what's the patch(es) that need approval now?
>
> Original expr.c patch for PR rtl-optimization/49429 + adjusted and augmented
> calls.c patch for PR target/49454.  Everything is in this thread.
>
> Easwaran, would you mind posting a consolidated patch?
>
> --
> Eric Botcazou
>
Here is the revised patch. Bootstraps and all tests pass on
x86_64-unknown-linux. OK for trunk?


2011-06-23  Easwaran Raman  

   PR rtl-optimization/49429
   PR target/49454
   * expr.c (emit_block_move_hints):  Mark MEM_EXPR(x) and
   MEM_EXPR(y) addressable if emit_block_move_via_libcall is
   used to copy y into x.
   * calls.c (initialize_argument_information): Mark
   an argument addressable if it is passed by invisible reference.
   (emit_library_call_value_1): Mark  MEM_EXPR (val) addressable
   if it is passed by reference.

Index: gcc/expr.c
===
--- gcc/expr.c  (revision 175346)
+++ gcc/expr.c  (working copy)
@@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu
   else if (may_use_call
   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
-retval = emit_block_move_via_libcall (x, y, size,
- method == BLOCK_OP_TAILCALL);
+{
+  /* Since x and y are passed to a libcall, mark the corresponding
+ tree EXPR as addressable.  */
+  tree y_expr = MEM_EXPR (y);
+  tree x_expr = MEM_EXPR (x);
+  if (y_expr)
+mark_addressable (y_expr);
+  if (x_expr)
+mark_addressable (x_expr);
+  retval = emit_block_move_via_libcall (x, y, size,
+   method == BLOCK_OP_TAILCALL);
+}
+
   else
 emit_block_move_via_loop (x, y, size, align);

Index: gcc/calls.c
===
--- gcc/calls.c (revision 175346)
+++ gcc/calls.c (working copy)
@@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A
  && TREE_CODE (base) != SSA_NAME
  && (!DECL_P (base) || MEM_P (DECL_RTL (base)
{
+  mark_addressable (args[i].tree_value);
+
  /* We can't use sibcalls if a callee-copied argument is
 stored in the current function's frame.  */
  if (!call_from_thunk_p && DECL_P (base) && !TREE_STATIC (base))
@@ -3524,7 +3526,12 @@ emit_library_call_value_1 (int retval, rtx orgfun,
}

  if (MEM_P (val) && !must_copy)
-   slot = val;
+{
+  tree val_expr = MEM_EXPR (val);
+  if (val_expr)
+mark_addressable (val_expr);
+ slot = val;
+}
  else
{
  slot = assign_temp (lang_hooks.types.type_for_mode (mode, 0),


Re: Mark variables addressable if they are copied using libcall in RTL expander

2011-06-23 Thread Easwaran Raman
On Thu, Jun 23, 2011 at 12:16 PM, Jakub Jelinek  wrote:
> On Thu, Jun 23, 2011 at 12:02:35PM -0700, Easwaran Raman wrote:
>> +      if (y_expr)
>> +        mark_addressable (y_expr);
>
> Please watch formatting, a tab should be used instead of 8 spaces.
>
>> +      if (x_expr)
>> +        mark_addressable (x_expr);
>
> Ditto.
>
>> @@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A
>>                 && TREE_CODE (base) != SSA_NAME
>>                 && (!DECL_P (base) || MEM_P (DECL_RTL (base)
>>           {
>> +              mark_addressable (args[i].tree_value);
>> +
>
> Likewise, plus the line is indented too much, each level should be indented
> by 2 chars.
>
>        Jakub
>

I have attached a new patch that fixes the formatting  issues.

Thanks,
Easwaran
Index: gcc/expr.c
===
--- gcc/expr.c	(revision 175346)
+++ gcc/expr.c	(working copy)
@@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu
   else if (may_use_call
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
-retval = emit_block_move_via_libcall (x, y, size,
-	  method == BLOCK_OP_TAILCALL);
+{
+  /* Since x and y are passed to a libcall, mark the corresponding
+	 tree EXPR as addressable.  */
+  tree y_expr = MEM_EXPR (y);
+  tree x_expr = MEM_EXPR (x);
+  if (y_expr)
+	mark_addressable (y_expr);
+  if (x_expr)
+	mark_addressable (x_expr);
+  retval = emit_block_move_via_libcall (x, y, size,
+	method == BLOCK_OP_TAILCALL);
+}
+
   else
 emit_block_move_via_loop (x, y, size, align);
 
Index: gcc/calls.c
===
--- gcc/calls.c	(revision 175346)
+++ gcc/calls.c	(working copy)
@@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A
 		  && TREE_CODE (base) != SSA_NAME
 		  && (!DECL_P (base) || MEM_P (DECL_RTL (base)
 	{
+	  mark_addressable (args[i].tree_value);
+
 	  /* We can't use sibcalls if a callee-copied argument is
 		 stored in the current function's frame.  */
 	  if (!call_from_thunk_p && DECL_P (base) && !TREE_STATIC (base))
@@ -3524,7 +3526,12 @@ emit_library_call_value_1 (int retval, rtx orgfun,
 	}
 
 	  if (MEM_P (val) && !must_copy)
-	slot = val;
+	{
+	  tree val_expr = MEM_EXPR (val);
+	  if (val_expr)
+		mark_addressable (val_expr);
+	  slot = val;
+	}
 	  else
 	{
 	  slot = assign_temp (lang_hooks.types.type_for_mode (mode, 0),


[google] Backport r175063, r175082 and r175384 from trunk to google/gcc-4_6 branch.

2011-07-24 Thread Easwaran Raman
Commited.


Improve stack layout heuristic.

2011-04-17 Thread Easwaran Raman
Hi,
 This patch impoves the heuristic used in assigning stack location to
stack variables.  Currently, if there are 3 variables A, B and C with
their sizes in increasing order and A and C have a conflict, it will
put A and B in a partition and C in a separate partition with a total
frame size of sizeof(B) + sizeof(C).  This patch puts B and C in the
same partition and A in a separate partition, with a total size of
sizeof(A) + sizeof(C).  The other improvement is when
-fno-strict-aliasing is used. In that case, it is ok to share stack
slot for a variable whose type transitively contains a union. The
other change in this patch removes the field offset in struct
stack_var. Right now, the field is always 0 due to the way it is set
in partition_stack_vars.

Bootstraps and no test regressions on x86_64-unknown-linux. OK for trunk?

-Easwaran


2011-04-17  Easwaran Raman  

* gcc/testsuite/gcc.dg/stack-layout-1.c: New test.
* gcc/testsuite/gcc.dg/stack-layout-2.c: New test.
* gcc/cfgexpand.c (struct stack_var): Remove OFFSET...
(add_stack_var): ...and its reference here...
(expand_stack_vars): ...and here.
(add_alias_set_conflicts): Add conflict to union variables
only when strict aliasing is used.
(stack_var_cmp): Sort by descending order of size.
(partition_stack_vars): Change heuristic.
(union_stack_vars): Fix to refelect changes in
partition_stack_vars.
(dump_stack_var_partition): Add newline after each partition.
Index: gcc/testsuite/gcc.dg/stack-layout-1.c
===
--- gcc/testsuite/gcc.dg/stack-layout-1.c   (revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-1.c   (revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-strict-aliasing -fdump-rtl-expand" } */
+union U {
+  int a;
+  float b;
+};
+struct A {
+  union U u1;
+  char a[100];
+};
+void bar (struct A *);
+void foo ()
+  {
+{
+  struct A a;
+  bar (&a);
+}
+{
+  struct A a;
+  bar (&a);
+}
+  }
+
+/* { dg-final { scan-rtl-dump-times "Partition" 1 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/stack-layout-2.c
===
--- gcc/testsuite/gcc.dg/stack-layout-2.c   (revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-2.c   (revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+void bar( char *);
+int foo()
+{
+  int i=0;
+  {
+char a[8000];
+bar(a);
+i += a[0];
+  }
+  {
+char a[8192];
+char b[32];
+bar(a);
+i += a[0];
+bar(b);
+i += a[0];
+  }
+  return i;
+}
+/* { dg-final { scan-rtl-dump "size 8192" "expand" } } */
+/* { dg-final { scan-rtl-dump "size 32" "expand" } } */
Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c (revision 171954)
+++ gcc/cfgexpand.c (working copy)
@@ -158,11 +158,6 @@
   /* The Variable.  */
   tree decl;

-  /* The offset of the variable.  During partitioning, this is the
- offset relative to the partition.  After partitioning, this
- is relative to the stack frame.  */
-  HOST_WIDE_INT offset;
-
   /* Initially, the size of the variable.  Later, the size of the partition,
  if this variable becomes it's partition's representative.  */
   HOST_WIDE_INT size;
@@ -267,7 +262,6 @@
   v = &stack_vars[stack_vars_num];

   v->decl = decl;
-  v->offset = 0;
   v->size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (decl)), 1);
   /* Ensure that all variables have size, so that &a != &b for any two
  variables that are simultaneously live.  */
@@ -372,8 +366,9 @@
 to elements will conflict.  In case of unions we have
 to be careful as type based aliasing rules may say
 access to the same memory does not conflict.  So play
-safe and add a conflict in this case.  */
- || contains_union)
+safe and add a conflict in this case when -fstrict-aliasing
+ is used.  */
+ || (contains_union && flag_strict_aliasing))
add_stack_var_conflict (i, j);
}
 }
@@ -403,9 +398,9 @@
 return (int)largeb - (int)largea;

   /* Secondary compare on size, decreasing  */
-  if (sizea < sizeb)
-return -1;
   if (sizea > sizeb)
+return -1;
+  if (sizea < sizeb)
 return 1;

   /* Tertiary compare on true alignment, decreasing.  */
@@ -564,28 +559,18 @@

 /* A subroutine of partition_stack_vars.  The UNION portion of a UNION/FIND
partitioning algorithm.  Partitions A and B are known to be non-conflicting.
-   Merge them into a single partition A.
+   Merge them into a single

Re: Improve stack layout heuristic.

2011-04-17 Thread Easwaran Raman
On Sun, Apr 17, 2011 at 1:39 PM, Steven Bosscher  wrote:
> On Sun, Apr 17, 2011 at 9:39 PM, Easwaran Raman  wrote:
>> @@ -372,8 +366,9 @@
>>                 to elements will conflict.  In case of unions we have
>>                 to be careful as type based aliasing rules may say
>>                 access to the same memory does not conflict.  So play
>> -                safe and add a conflict in this case.  */
>> -             || contains_union)
>> +                safe and add a conflict in this case when -fstrict-aliasing
>> +                 is used.  */
>> +             || (contains_union && flag_strict_aliasing))
>>            add_stack_var_conflict (i, j);
>>        }
>>     }
>
> Are you sure this change is safe? See http://gcc.gnu.org/PR25654
>
> Ciao!
> Steven
>

I tried the test case in PR25654 and with my patch and on -O2
-fno-strict-aliasing  it puts the two variables in the same partition
and the test passes. My understanding of that issue is that with
-fstrict-aliasing, the short * and int * are assumed to point to
different locations and hence they can't share stack slots, but with
-fno-strict-aliasing that assumption is not valid.

Thanks,
Easwaran


Re: Improve stack layout heuristic.

2011-04-18 Thread Easwaran Raman
On Mon, Apr 18, 2011 at 5:58 AM, Michael Matz  wrote:
>
> Hi,
>
> [FWIW I can't approve patches, but some feedback nevertheless]
>
> On Sun, 17 Apr 2011, Easwaran Raman wrote:
>
> >  This patch impoves the heuristic used in assigning stack location to
> > stack variables.  Currently, if there are 3 variables A, B and C with
> > their sizes in increasing order and A and C have a conflict, it will
> > put A and B in a partition and C in a separate partition with a total
> > frame size of sizeof(B) + sizeof(C).  This patch puts B and C in the
> > same partition and A in a separate partition, with a total size of
> > sizeof(A) + sizeof(C).
>
> That's the change in stack_var_cmp, to match the comment, right?  Makes
> sense.
>
> > The other change in this patch removes the field offset in struct
> > stack_var. Right now, the field is always 0 due to the way it is set in
> > partition_stack_vars.
>
> Huh, indeed, starting with the initial version of that code already.  The
> idea clearly was to pack multiple conflicting small objects into the space
> of only one larger non-conflicting one by placing them at different
> offsets, but that never seems to have been implemented and would require
> different tracking of conflicts.  I think removing the offset tracking
> right now is okay, can be reintroduced once somebody gets motivated
> enough.

Yes, the intent seems to be what you describe above. I haven't thought
about it much, but I am not sure how much a non-backtracking version
will buy over the current heuristic.

>
> > -     and merge them into partition A.  */
> > -  for (last = i = b; i != EOC; last = i, i = stack_vars[i].next)
> > -    {
> > -      stack_vars[i].offset += offset;
> > -      stack_vars[i].representative = a;
> > -    }
> > -  stack_vars[last].next = stack_vars[a].next;
> > +   /* Add B to A's partition.  */
> > +  stack_vars[b].next = stack_vars[a].next;
> > +  stack_vars[b].representative = a;
>
> Hmm.  This seems fishy.  You don't update the representatives of the
> members of the partition that B is the leader of.  Additionally you break
> the chain of B's members.  That is only a problem if B actually has more
> than one member.  That might be always the case with your changed sorting
> order, but it's an important invariant, so please assert this in this
> routine.
>
B always has one member is an invariant in my scheme. The object
corresponding to the outer loop index is always the representative. If
B has to have more than one members, it must have been visited in the
outer loop in some earlier iteration. But then, its index in the
stack_vars_sorted must be greater than the current i. I have added an
assertion than stack_vars[b].next == EOC in union_stack_vars which is
true only when B has one member.

>
> > @@ -596,7 +581,7 @@
> >    if (vb->conflicts)
> >      {
> >        EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
> > -     add_stack_var_conflict (a, stack_vars[u].representative);
> > +     add_stack_var_conflict (a, u);
>
> Please don't.  This uselessly bloats the conflict bitmaps.

It is sufficient to add the conflicts of  a variable only when it is
not merged into some group.  I am adding a check to that effect.
I am attaching a new patch which excludes the strict aliasing check
(which I will send separately) and the changes I have mentioned above.
Bootstraps and no regressions in tests.

Thanks,
Easwaran


>
> Ciao,
> Michael.
Index: gcc/testsuite/gcc.dg/stack-layout-2.c
===
--- gcc/testsuite/gcc.dg/stack-layout-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-2.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+void bar( char *);
+int foo()
+{
+  int i=0;
+  {
+char a[8000];
+bar(a);
+i += a[0];
+  }
+  {
+char a[8192];
+char b[32];
+bar(a);
+i += a[0];
+bar(b);
+i += a[0];
+  }
+  return i;
+}
+/* { dg-final { scan-rtl-dump "size 8192" "expand" } } */
+/* { dg-final { scan-rtl-dump "size 32" "expand" } } */
Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c	(revision 171954)
+++ gcc/cfgexpand.c	(working copy)
@@ -158,11 +158,6 @@
   /* The Variable.  */
   tree decl;
 
-  /* The offset of the variable.  During partitioning, this is the
- offset relative to the partition.  After partitioning, this
- is relative to the stack frame.  */
-  HOST_WIDE_INT offset;
-
   /* Initially, the size of the variable.  Later, the size of the partition,
  if this variable becomes it's partition's re

Re: Improve stack layout heuristic.

2011-04-19 Thread Easwaran Raman
On Tue, Apr 19, 2011 at 5:08 AM, Michael Matz  wrote:
> Hi,
>
> On Mon, 18 Apr 2011, Easwaran Raman wrote:
>
>> > > @@ -596,7 +581,7 @@
>> > >    if (vb->conflicts)
>> > >      {
>> > >        EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
>> > > -     add_stack_var_conflict (a, stack_vars[u].representative);
>> > > +     add_stack_var_conflict (a, u);
>> >
>> > Please don't.  This uselessly bloats the conflict bitmaps.
>>
>> It is sufficient to add the conflicts of  a variable only when it is
>> not merged into some group.
>
> That is correct but is also what the use of stack_vars[u].representative
> achieves alone, ...
>
>> I am adding a check to that effect.
>
> ... without any check.
>
> @@ -596,7 +581,8 @@
>   if (vb->conflicts)
>     {
>       EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
> -       add_stack_var_conflict (a, stack_vars[u].representative);
> +        if (stack_vars[u].next == EOC && stack_vars[u].representative == u)
> +          add_stack_var_conflict (a, u);
>       BITMAP_FREE (vb->conflicts);
>     }
>  }
>
> What's your objective with this change?  I find the original code clearer.

Let us say we try to merge 'a' to 'b' and 'a' has conflicts with many
members of an existing partition C. It is not necessary to add all
those conflicts to 'b' since they will be never considered in the call
to union_stack_vars. I was motivated by your comment on bit-vector
bloat to try this, but if this affects readability I'll happily revert
back to what it was before.

-Easwaran

>
>
> Ciao,
> Michael.


[google]Pass --save-temps to the assembler (issue4436049)

2011-04-19 Thread Easwaran Raman
This makes the gcc driver pass the --save-temps option to the assembler or 
assembler wrapper so that post-assembly tools like MAO can be integrated. 
Bootstraps on x86_64. Ok for google/main?

2011-04-19  Easwaran Raman  

* gcc/gcc.c (static const char *asm_options): Pass
--save-temps to the assembler.

Index: gcc/gcc.c
===
--- gcc/gcc.c   (revision 172727)
+++ gcc/gcc.c   (working copy)
@@ -780,7 +780,7 @@ static const char *asm_options =
 #if HAVE_GNU_AS
 /* If GNU AS is used, then convert -w (no warnings), -I, and -v
to the assembler equivalents.  */
-"%{v} %{w:-W} %{I*} "
+"%{v} %{w:-W} %{I*} %{save-temps:--save-temps} "
 #endif
 "%a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O}";
 

--
This patch is available for review at http://codereview.appspot.com/4436049


Re: [google]Pass --save-temps to the assembler (issue4436049)

2011-04-19 Thread Easwaran Raman
On Tue, Apr 19, 2011 at 12:19 PM, Diego Novillo  wrote:
> On Tue, Apr 19, 2011 at 15:02, Easwaran Raman  wrote:
>> This makes the gcc driver pass the --save-temps option to the assembler or 
>> assembler wrapper so that post-assembly tools like MAO can be integrated. 
>> Bootstraps on x86_64. Ok for google/main?
>>
>> 2011-04-19  Easwaran Raman  
>>
>>        * gcc/gcc.c (static const char *asm_options): Pass
>
> No need to specify 'static const char *'.  Just 'asm_options'.
>
>>        --save-temps to the assembler.
>
> Could you add a comment stating what version of binutils we need to use this?
This is to work with the assembler wrapper that has to know whether to
save the assembly file after running any post-assembly tools like MAO.

> I will commit this patch for you.  Given that you will be sending
> other patches, please fill in the write access form at
> http://sourceware.org/cgi-bin/pdw/ps_form.cgi
>
>
> Diego.
>


Re: [google]Pass --save-temps to the assembler (issue4436049)

2011-04-19 Thread Easwaran Raman
On Tue, Apr 19, 2011 at 12:19 PM, Diego Novillo  wrote:
> On Tue, Apr 19, 2011 at 15:02, Easwaran Raman  wrote:
>> This makes the gcc driver pass the --save-temps option to the assembler or 
>> assembler wrapper so that post-assembly tools like MAO can be integrated. 
>> Bootstraps on x86_64. Ok for google/main?
>>
>> 2011-04-19  Easwaran Raman  
>>
>>        * gcc/gcc.c (static const char *asm_options): Pass
>
> No need to specify 'static const char *'.  Just 'asm_options'.
>
>>        --save-temps to the assembler.
>
> Could you add a comment stating what version of binutils we need to use this?
>
> I will commit this patch for you.  Given that you will be sending
> other patches, please fill in the write access form at
> http://sourceware.org/cgi-bin/pdw/ps_form.cgi
>
>
> Diego.
>

The revised patch has a comment that this should be used with an
assembler wrapper that can recognize --save-temps.

-Easwaran
2011-04-19  Easwaran Raman  

	* gcc/gcc.c (asm_options): Pass --save-temps to assembler.

Index: gcc/gcc.c
===
--- gcc/gcc.c	(revision 172727)
+++ gcc/gcc.c	(working copy)
@@ -775,12 +775,16 @@ static const char *cc1_options =
  %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants}\
  %{coverage:-fprofile-arcs -ftest-coverage}";
 
+/* If an assembler wrapper is used to invoke post-assembly tools
+   like MAO, --save-temps need to be passed to save the output of
+   such post-processing tools. This is not compatible with vanilla
+   binutils gas which does not recognize --save-temps.  */
 static const char *asm_options =
 "%{-target-help:%:print-asm-header()} "
 #if HAVE_GNU_AS
 /* If GNU AS is used, then convert -w (no warnings), -I, and -v
to the assembler equivalents.  */
-"%{v} %{w:-W} %{I*} "
+"%{v} %{w:-W} %{I*} %{save-temps:--save-temps} "
 #endif
 "%a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O}";
 


Allow union variables to share stack slots wwith -fno-strict-aliasing (issue4444051)

2011-04-19 Thread Easwaran Raman
Hi,
 This patch allows variables whose type  transitively contains a union to share 
stack slots if -fno-strict-aliasing is used.

 Bootstraps on x86_64 with no test regressions. Also tested by changing 
flag_strict_aliasing to 0 by default. Bootstraps and no test regressions when 
compared to flag_strict_aliasing=0 without this patch.

OK for trunk?

-Easwaran

2011-04-19  Easwaran Raman  

* gcc/testsuite/gcc.dg/stack-layout-1.c: New
* gcc/cfgexpand.c (add_alias_set_conflicts): Add conflicts
with a variable containing union type only with
-fstrict-aliasing.

Index: gcc/testsuite/gcc.dg/stack-layout-1.c
===
--- gcc/testsuite/gcc.dg/stack-layout-1.c   (revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-1.c   (revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-strict-aliasing -fdump-rtl-expand" } */
+union U {
+  int a;
+  float b;
+};
+struct A {
+  union U u1;
+  char a[100];
+};
+void bar (struct A *);
+void foo ()
+  {
+{
+  struct A a;
+  bar (&a);
+}
+{
+  struct A a;
+  bar (&a);
+}
+  }
+
+/* { dg-final { scan-rtl-dump-times "Partition" 1 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c (revision 172657)
+++ gcc/cfgexpand.c (working copy)
@@ -373,8 +373,9 @@ add_alias_set_conflicts (void)
 to elements will conflict.  In case of unions we have
 to be careful as type based aliasing rules may say
 access to the same memory does not conflict.  So play
-safe and add a conflict in this case.  */
- || contains_union)
+safe and add a conflict in this case when
+ -fstrict-aliasing is used.  */
+  || (contains_union && flag_strict_aliasing))
add_stack_var_conflict (i, j);
}
 }

--
This patch is available for review at http://codereview.appspot.com/051


Re: Improve stack layout heuristic.

2011-04-20 Thread Easwaran Raman
On Wed, Apr 20, 2011 at 6:53 AM, Michael Matz  wrote:
> Hi,
>
> On Tue, 19 Apr 2011, Easwaran Raman wrote:
>
>> > That is correct but is also what the use of stack_vars[u].representative
>> > achieves alone, ...
>> >
>> >> I am adding a check to that effect.
>> >
>> > ... without any check.
>> >
>> > @@ -596,7 +581,8 @@
>> >   if (vb->conflicts)
>> >     {
>> >       EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
>> > -       add_stack_var_conflict (a, stack_vars[u].representative);
>> > +        if (stack_vars[u].next == EOC && stack_vars[u].representative == 
>> > u)
>> > +          add_stack_var_conflict (a, u);
>> >       BITMAP_FREE (vb->conflicts);
>> >     }
>> >  }
>> >
>> > What's your objective with this change?  I find the original code clearer.
>>
>> Let us say we try to merge 'a' to 'b' and 'a' has conflicts with many
>> members of an existing partition C. It is not necessary to add all
>> those conflicts to 'b' since they will be never considered in the call
>> to union_stack_vars.
>
> Right, that's why I was objecting to your initial change. a
I agree that my initial change - adding a conflict with u -  was wrong.
> The original
> code (adding stack_vars[u].representative to the conflicts of A) made sure
> the target conflict bitmap only got representatives added.
In my above example, it is not even necessary to add a conflict
between 'b' and representative(C) since it is already in a partition.
But you're right - not adding that conflict doesn't actually reduce
the size of bit maps. Reverting back to what was there originally.

Thanks,
Easwaran

 That's why I
> was asking why you changed this area at all.
>
>> I was motivated by your comment on bit-vector bloat to try this, but if
>> this affects readability I'll happily revert back to what it was before.
>
>
> Ciao,
> Michael.
Index: gcc/testsuite/gcc.dg/stack-layout-2.c
===
--- gcc/testsuite/gcc.dg/stack-layout-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-2.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+void bar( char *);
+int foo()
+{
+  int i=0;
+  {
+char a[8000];
+bar(a);
+i += a[0];
+  }
+  {
+char a[8192];
+char b[32];
+bar(a);
+i += a[0];
+bar(b);
+i += a[0];
+  }
+  return i;
+}
+/* { dg-final { scan-rtl-dump "size 8192" "expand" } } */
+/* { dg-final { scan-rtl-dump "size 32" "expand" } } */
Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c	(revision 171954)
+++ gcc/cfgexpand.c	(working copy)
@@ -158,11 +158,6 @@
   /* The Variable.  */
   tree decl;
 
-  /* The offset of the variable.  During partitioning, this is the
- offset relative to the partition.  After partitioning, this
- is relative to the stack frame.  */
-  HOST_WIDE_INT offset;
-
   /* Initially, the size of the variable.  Later, the size of the partition,
  if this variable becomes it's partition's representative.  */
   HOST_WIDE_INT size;
@@ -267,7 +262,6 @@
   v = &stack_vars[stack_vars_num];
 
   v->decl = decl;
-  v->offset = 0;
   v->size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (decl)), 1);
   /* Ensure that all variables have size, so that &a != &b for any two
  variables that are simultaneously live.  */
@@ -403,9 +397,9 @@
 return (int)largeb - (int)largea;
 
   /* Secondary compare on size, decreasing  */
-  if (sizea < sizeb)
-return -1;
   if (sizea > sizeb)
+return -1;
+  if (sizea < sizeb)
 return 1;
 
   /* Tertiary compare on true alignment, decreasing.  */
@@ -564,28 +558,19 @@
 
 /* A subroutine of partition_stack_vars.  The UNION portion of a UNION/FIND
partitioning algorithm.  Partitions A and B are known to be non-conflicting.
-   Merge them into a single partition A.
+   Merge them into a single partition A.  */
 
-   At the same time, add OFFSET to all variables in partition B.  At the end
-   of the partitioning process we've have a nice block easy to lay out within
-   the stack frame.  */
-
 static void
-union_stack_vars (size_t a, size_t b, HOST_WIDE_INT offset)
+union_stack_vars (size_t a, size_t b)
 {
-  size_t i, last;
   struct stack_var *vb = &stack_vars[b];
   bitmap_iterator bi;
   unsigned u;
 
-  /* Update each element of partition B with the given offset,
- and merge them into partition A.  */
-  for (last = i = b; i != EOC; last = i, i = stack_vars[i].next)
-{
-  stack_vars[i].offset += offset;

  1   2   >