[Google] Refine hot caller heuristic
The current hot caller heuristic simply promotes edges whose caller is hot. This patch does the following: * Turn it off for applications with large footprint since the size increase hurts them * Be more selective by considering arguments to callee when the heuristic is enabled. This performs well on internal benchmarks. Ok for google/4_8 branch if all tests pass? - Easwaran hot_caller.patch Description: Binary data
Re: [Google] Refine hot caller heuristic
On Tue, Aug 20, 2013 at 9:35 PM, Xinliang David Li wrote: > Do you need to guard the jump function access with check if > (ipa_node_params_vector.exists ())? I believe it is not necessary since, for example, ipa_analyze_node calls ipa_check_create_node_params that calls create. But I see it is used in ipa-inline-analysis.c everywhere. So I have added a check and conservatively return false. > > Ideally, useful_cold_callee should be folded into the inline hints > estimation. Question about the heuristic: why filtering out > PASS_THROUGH parameter cases completely? Passing 'this' parameter in > many cases can result in good PRE opportunities. Why not skip the > unknown type? The rationale is it is useful to inline bar into foo in the snippet below: void foo () { A a; bar(&a); ... } Capturing this requires UNKNOWN and KNOWN_TYPE jump functions. I have changed the check accordingly. I have attached the new patch. - Easwaran > David > > On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman wrote: >> The current hot caller heuristic simply promotes edges whose caller is >> hot. This patch does the following: >> * Turn it off for applications with large footprint since the size >> increase hurts them >> * Be more selective by considering arguments to callee when the >> heuristic is enabled. >> >> This performs well on internal benchmarks. Ok for google/4_8 branch if >> all tests pass? >> >> - Easwaran hot_caller.patch Description: Binary data
Re: [Google] Refine hot caller heuristic
enable_hot_caller_heuristic (); >>if (cgraph_maybe_hot_edge_p (edge)) >> return true; >> >> @@ -543,9 +591,17 @@ edge_hot_enough_p (struct cgraph_edge *edge) >>if (flag_auto_profile && edge->callee->count == 0 >>&& edge->callee->max_bb_count > 0) >> return false; >> - if (PARAM_VALUE (PARAM_INLINE_HOT_CALLER) >> - && maybe_hot_count_p (NULL, edge->caller->max_bb_count)) >> -return true; >> + if (use_hot_caller_heuristic) >> +{ >> + struct cgraph_node *where = edge->caller; >> + if (maybe_hot_count_p (NULL, where->max_bb_count)) >> +{ >> + if (PARAM_VALUE (PARAM_INLINE_USEFUL_COLD_CALLEE)) >> +return useful_cold_callee (edge); >> + else >> +return true; >> +} >> +} >>return false; >> } > > On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman wrote: >> The current hot caller heuristic simply promotes edges whose caller is >> hot. This patch does the following: >> * Turn it off for applications with large footprint since the size >> increase hurts them >> * Be more selective by considering arguments to callee when the >> heuristic is enabled. >> >> This performs well on internal benchmarks. Ok for google/4_8 branch if >> all tests pass? >> >> - Easwaran > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: Fix PR middle-end/57370
Richard, Do you want me to commit everything but the insert_stmt_after hunk now? There are more similar failures reported in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached the updated patch there. Shall I send that for review? Apart from the debug statement issue, almost all the bugs are due to dependence violation because certain newly inserted statements do not have the right UID. Instead of trying to catch all of them, will it be better if I check if the stmt has a proper uid (non-zero if it is not the first stmt) and assign a sensible value at the point where it is used (find_insert_point and appears_later_in_bb) instead of where the stmt is created? I think that would be less brittle. - Easwaran On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener wrote: > On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman wrote: >> I have a new patch that supersedes this. The new patch also fixes PR >> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and >> no test regression on x86_64/linux. Ok for trunk? >> >> 2013-07-31 Easwaran Raman >> >> PR middle-end/57370 >> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset >> of debug statements that cause inconsistent IR. > > Missing ChangeLog entry for the insert_stmt_after hunk which I do not like > at all. The other hunks are ok, but we need to work harder to preserve > debug stmts - simply removing all is not going to fly. > > Richard. > >> >> testsuite/ChangeLog: >> 2013-07-31 Easwaran Raman >> >> PR middle-end/57370 >> PR tree-optimization/57393 >> PR tree-optimization/58011 >> * gfortran.dg/reassoc_12.f90: New testcase. >> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >> >> >> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman wrote: >>> Ping. >>> >>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman wrote: >>>> A newly generated statement in build_and_add_sum function of >>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent >>>> statement. In one instance, it was assigned the wrong uid (of an >>>> earlier phi statement) which messed up the IR and caused the test >>>> program to hang. Bootstraps and no test regressions on x86_64/linux. >>>> Ok for trunk? >>>> >>>> Thanks, >>>> Easwaran >>>> >>>> >>>> 2013-06-27 Easwaran Raman >>>> >>>> PR middle-end/57370 >>>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a >>>> phi >>>> node for a non-phi gimple statement. >>>> >>>> testsuite/ChangeLog: >>>> 2013-06-27 Easwaran Raman >>>> >>>> PR middle-end/57370 >>>> * gfortran.dg/reassoc_12.f90: New testcase. >>>> >>>> >>>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 >>>> === >>>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>>> @@ -0,0 +1,74 @@ >>>> +! { dg-do compile } >>>> +! { dg-options "-O2 -ffast-math" } >>>> +! PR middle-end/57370 >>>> + >>>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, & >>>> + grad_deriv,npoints, sx) >>>> +IMPLICIT REAL*8 (t) >>>> +INTEGER, PARAMETER :: dp=8 >>>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, & >>>> + e_ndrho_ndrho_rho >>>> + DO ii=1,npoints >>>> + IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN >>>> +t1425 = t233 * t557 >>>> +t1429 = beta * t225 >>>> +t1622 = t327 * t1621 >>>> +t1626 = t327 * t1625 >>>> +t1632 = t327 * t1631 >>>> +t1685 = t105 * t1684 >>>> +t2057 = t1636 + t8 * (t2635 + t3288) >>>> + END IF >>>> + IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN >>>> +t5469 = t5440 - t5443 - t5446 - t5449 - & >>>> +t5451 - t5454 - t5456 + t5459 - & >>>> +t5462 + t5466 - t5468 >>>> +t5478 = 0.240e2_dp * t1616 *
Re: Fix PR middle-end/57370
On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener wrote: > On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman wrote: >> Richard, >> Do you want me to commit everything but the insert_stmt_after hunk >> now? > > Yes. > >> There are more similar failures reported in >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached >> the updated patch there. Shall I send that for review? Apart from the >> debug statement issue, almost all the bugs are due to dependence >> violation because certain newly inserted statements do not have the >> right UID. Instead of trying to catch all of them, will it be better >> if I check if the stmt has a proper uid (non-zero if it is not the >> first stmt) and assign a sensible value at the point where it is used >> (find_insert_point and appears_later_in_bb) instead of where the stmt >> is created? I think that would be less brittle. > > In the end all this placement stuff should be reverted and done as > post-processing after reassoc is finished. You can remember the > inserted stmts that are candidates for moving (just set a pass-local > flag on them) and assign UIDs for the whole function after all stmts > are inserted. The problem is we need sane UID values as reassoc is happening - not after it is finished. As it process each stmt in reassoc_bb, it might generate new stmts which might have to be compared in find_insert_point or appears_later_in_bb. - Easwaran > Richard. > > >> - Easwaran >> >> >> >> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener >> wrote: >>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman wrote: >>>> I have a new patch that supersedes this. The new patch also fixes PR >>>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and >>>> no test regression on x86_64/linux. Ok for trunk? >>>> >>>> 2013-07-31 Easwaran Raman >>>> >>>> PR middle-end/57370 >>>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset >>>> of debug statements that cause inconsistent IR. >>> >>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like >>> at all. The other hunks are ok, but we need to work harder to preserve >>> debug stmts - simply removing all is not going to fly. >>> >>> Richard. >>> >>>> >>>> testsuite/ChangeLog: >>>> 2013-07-31 Easwaran Raman >>>> >>>> PR middle-end/57370 >>>> PR tree-optimization/57393 >>>> PR tree-optimization/58011 >>>> * gfortran.dg/reassoc_12.f90: New testcase. >>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >>>> >>>> >>>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman wrote: >>>>> Ping. >>>>> >>>>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman >>>>> wrote: >>>>>> A newly generated statement in build_and_add_sum function of >>>>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent >>>>>> statement. In one instance, it was assigned the wrong uid (of an >>>>>> earlier phi statement) which messed up the IR and caused the test >>>>>> program to hang. Bootstraps and no test regressions on x86_64/linux. >>>>>> Ok for trunk? >>>>>> >>>>>> Thanks, >>>>>> Easwaran >>>>>> >>>>>> >>>>>> 2013-06-27 Easwaran Raman >>>>>> >>>>>> PR middle-end/57370 >>>>>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of >>>>>> a phi >>>>>> node for a non-phi gimple statement. >>>>>> >>>>>> testsuite/ChangeLog: >>>>>> 2013-06-27 Easwaran Raman >>>>>> >>>>>> PR middle-end/57370 >>>>>> * gfortran.dg/reassoc_12.f90: New testcase. >>>>>> >>>>>> >>>>>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 >>>>>> === >>>>>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>>>>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>>>>> @@ -0,0 +1,74 @@ >>>>>> +! { dg-do comp
Re: Fix PR middle-end/57370
On Fri, Aug 30, 2013 at 9:26 AM, Jakub Jelinek wrote: > On Fri, Aug 30, 2013 at 09:13:34AM -0700, Easwaran Raman wrote: >> >> There are more similar failures reported in >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached >> >> the updated patch there. Shall I send that for review? Apart from the >> >> debug statement issue, almost all the bugs are due to dependence >> >> violation because certain newly inserted statements do not have the >> >> right UID. Instead of trying to catch all of them, will it be better >> >> if I check if the stmt has a proper uid (non-zero if it is not the >> >> first stmt) and assign a sensible value at the point where it is used >> >> (find_insert_point and appears_later_in_bb) instead of where the stmt >> >> is created? I think that would be less brittle. >> > >> > In the end all this placement stuff should be reverted and done as >> > post-processing after reassoc is finished. You can remember the >> > inserted stmts that are candidates for moving (just set a pass-local >> > flag on them) and assign UIDs for the whole function after all stmts >> > are inserted. >> >> The problem is we need sane UID values as reassoc is happening - not >> after it is finished. As it process each stmt in reassoc_bb, it might >> generate new stmts which might have to be compared in >> find_insert_point or appears_later_in_bb. > > A new stmt will be created with UID 0 and in case there is stmt movement, > you could zero the UID during movement. Then, you can just special case > UID zero when you are looking at UIDs. As on trunk/4.8 gsi_for_stmt is > O(1), you can easily walk a couple of previous and/or later stmts and look > for the first non-zero UID around it, and treat it as if the UID was > previous non-zero UID + 0.5 or next non-zero UID - 0.5. And, if you see > too many consecutive statements with UID 0 (more than some constant, 32?), > just reassign UIDs to the whole bb. Or you could initially assign UIDs > with some gaps, then keep filling those gaps and once you fill them, > renumber again with new gaps. > Jakub Yes, this is pretty much what I was proposing. The current implementation doesn't rely on UIDs being unique - they only have to be monotonically non-decreasing. So, when a UID of 0 is encountered, go up till a non-zero UID is found and then go down and assign this non-zero UID. This effectively implies that each UID-0 stmt is visited at most twice. I don't think we need to check if I see more than certain number of UID-0 stmts and redo the entire BB. - Easwaran
Re: Fix PR middle-end/57370
Submitted the patch (r202262) without the insert_stmt_after hunk. Also fixed nits pointed out by Jakub. - Easwaran On Mon, Sep 2, 2013 at 2:31 AM, Richard Biener wrote: > On Fri, Aug 30, 2013 at 6:13 PM, Easwaran Raman wrote: >> On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener >> wrote: >>> On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman wrote: >>>> Richard, >>>> Do you want me to commit everything but the insert_stmt_after hunk >>>> now? >>> >>> Yes. >>> >>>> There are more similar failures reported in >>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached >>>> the updated patch there. Shall I send that for review? Apart from the >>>> debug statement issue, almost all the bugs are due to dependence >>>> violation because certain newly inserted statements do not have the >>>> right UID. Instead of trying to catch all of them, will it be better >>>> if I check if the stmt has a proper uid (non-zero if it is not the >>>> first stmt) and assign a sensible value at the point where it is used >>>> (find_insert_point and appears_later_in_bb) instead of where the stmt >>>> is created? I think that would be less brittle. >>> >>> In the end all this placement stuff should be reverted and done as >>> post-processing after reassoc is finished. You can remember the >>> inserted stmts that are candidates for moving (just set a pass-local >>> flag on them) and assign UIDs for the whole function after all stmts >>> are inserted. >> >> The problem is we need sane UID values as reassoc is happening - not >> after it is finished. As it process each stmt in reassoc_bb, it might >> generate new stmts which might have to be compared in >> find_insert_point or appears_later_in_bb. > > But if you no longer find_insert_point during reassoc but instead do > a "scheduling" pass after it finished you won't need sane UIDs during > reassoc. > > Richard. > >> - Easwaran >> >>> Richard. >>> >>> >>>> - Easwaran >>>> >>>> >>>> >>>> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener >>>> wrote: >>>>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman wrote: >>>>>> I have a new patch that supersedes this. The new patch also fixes PR >>>>>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and >>>>>> no test regression on x86_64/linux. Ok for trunk? >>>>>> >>>>>> 2013-07-31 Easwaran Raman >>>>>> >>>>>> PR middle-end/57370 >>>>>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and >>>>>> reset >>>>>> of debug statements that cause inconsistent IR. >>>>> >>>>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like >>>>> at all. The other hunks are ok, but we need to work harder to preserve >>>>> debug stmts - simply removing all is not going to fly. >>>>> >>>>> Richard. >>>>> >>>>>> >>>>>> testsuite/ChangeLog: >>>>>> 2013-07-31 Easwaran Raman >>>>>> >>>>>> PR middle-end/57370 >>>>>> PR tree-optimization/57393 >>>>>> PR tree-optimization/58011 >>>>>> * gfortran.dg/reassoc_12.f90: New testcase. >>>>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >>>>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >>>>>> >>>>>> >>>>>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman >>>>>> wrote: >>>>>>> Ping. >>>>>>> >>>>>>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman >>>>>>> wrote: >>>>>>>> A newly generated statement in build_and_add_sum function of >>>>>>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent >>>>>>>> statement. In one instance, it was assigned the wrong uid (of an >>>>>>>> earlier phi statement) which messed up the IR and caused the test >>>>>>>> program to hang. Bootstraps and no test regressions on x86_64/linux. >>>>>>>> Ok for trunk? >>>>>>>> >>>>>>&g
Re: Fix PR middle-end/57370
On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener wrote: > On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman wrote: >> I have a new patch that supersedes this. The new patch also fixes PR >> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and >> no test regression on x86_64/linux. Ok for trunk? >> >> 2013-07-31 Easwaran Raman >> >> PR middle-end/57370 >> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset >> of debug statements that cause inconsistent IR. > > Missing ChangeLog entry for the insert_stmt_after hunk which I do not like > at all. The other hunks are ok, but we need to work harder to preserve > debug stmts - simply removing all is not going to fly. > > Richard. I looked into the problem related to the debug stmts in this failing test case in detail. Initially, the code sequence looks s$n_13 = MEM[(struct S *)&s]; # DEBUG s$n => s$n_13 _2 = (double) s$n_13; _4 = _2 * a_3(D); _6 = (double) i_5(D); _7 = _4 * _6; _9 = (double) j_8(D); _10 = _7 * _9; bar (_10); # DEBUG D#2 => (double) k_12(D) # DEBUG D#1 => _7 * D#2 # DEBUG t$n => (int) D#1 After reassociation s$n_13 = MEM[(struct S *)&s]; # DEBUG s$n => s$n_13 _2 = (double) s$n_13; _6 = (double) i_5(D); # DEBUG D#3 => _4 * _6 _9 = (double) j_8(D); _4 = _9 * _2; _7 = _4 * a_3(D); _10 = _7 * _6; bar (_10); # DEBUG D#2 => (double) k_12(D) # DEBUG D#1 => D#3 * D#2 # DEBUG t$n => (int) D#1 In the above, # DEBUG D#3 => _4 * _6 appears above the statement that defines _4. But even if I move the def of D#3 below the statement defining _4, it is not sufficient. Before reassociation, t$n refers to the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after reassociation it would refer to ( j_8(D) * s$n_13 * i_5(D) * k_12(D)). It seems the correct fix is to discard the debug temps whose RHS contains a variable that is involved in reassociation and then reconstruct it. Is that the expected fix here? - Easwaran >> >> testsuite/ChangeLog: >> 2013-07-31 Easwaran Raman >> >> PR middle-end/57370 >> PR tree-optimization/57393 >> PR tree-optimization/58011 >> * gfortran.dg/reassoc_12.f90: New testcase. >> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >> >> >> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman wrote: >>> Ping. >>> >>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman wrote: >>>> A newly generated statement in build_and_add_sum function of >>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent >>>> statement. In one instance, it was assigned the wrong uid (of an >>>> earlier phi statement) which messed up the IR and caused the test >>>> program to hang. Bootstraps and no test regressions on x86_64/linux. >>>> Ok for trunk? >>>> >>>> Thanks, >>>> Easwaran >>>> >>>> >>>> 2013-06-27 Easwaran Raman >>>> >>>> PR middle-end/57370 >>>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a >>>> phi >>>> node for a non-phi gimple statement. >>>> >>>> testsuite/ChangeLog: >>>> 2013-06-27 Easwaran Raman >>>> >>>> PR middle-end/57370 >>>> * gfortran.dg/reassoc_12.f90: New testcase. >>>> >>>> >>>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 >>>> === >>>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>>> @@ -0,0 +1,74 @@ >>>> +! { dg-do compile } >>>> +! { dg-options "-O2 -ffast-math" } >>>> +! PR middle-end/57370 >>>> + >>>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, & >>>> + grad_deriv,npoints, sx) >>>> +IMPLICIT REAL*8 (t) >>>> +INTEGER, PARAMETER :: dp=8 >>>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, & >>>> + e_ndrho_ndrho_rho >>>> + DO ii=1,npoints >>>> + IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN >>>> +t1425 = t233 * t557 >>>> +t1429 = beta * t225 >>>> +t1622 = t327 * t1621 >>>> +t1626 = t327 * t1625 >>>> +t16
Re: Fix PR middle-end/57370
On Fri, Sep 6, 2013 at 12:05 AM, Richard Biener wrote: > On Thu, Sep 5, 2013 at 9:23 PM, Easwaran Raman wrote: >> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener >> wrote: >>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman wrote: >>>> I have a new patch that supersedes this. The new patch also fixes PR >>>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and >>>> no test regression on x86_64/linux. Ok for trunk? >>>> >>>> 2013-07-31 Easwaran Raman >>>> >>>> PR middle-end/57370 >>>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset >>>> of debug statements that cause inconsistent IR. >>> >>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like >>> at all. The other hunks are ok, but we need to work harder to preserve >>> debug stmts - simply removing all is not going to fly. >>> >>> Richard. >> >> I looked into the problem related to the debug stmts in this failing >> test case in detail. Initially, the code sequence looks >> >> >> s$n_13 = MEM[(struct S *)&s]; >> # DEBUG s$n => s$n_13 >> _2 = (double) s$n_13; >> _4 = _2 * a_3(D); >> _6 = (double) i_5(D); >> _7 = _4 * _6; >> _9 = (double) j_8(D); >> _10 = _7 * _9; >> bar (_10); >> # DEBUG D#2 => (double) k_12(D) >> # DEBUG D#1 => _7 * D#2 >> # DEBUG t$n => (int) D#1 >> >> After reassociation >> >> s$n_13 = MEM[(struct S *)&s]; >> # DEBUG s$n => s$n_13 >> _2 = (double) s$n_13; >> _6 = (double) i_5(D); >> # DEBUG D#3 => _4 * _6 >> _9 = (double) j_8(D); >> _4 = _9 * _2; >> _7 = _4 * a_3(D); >> _10 = _7 * _6; >> bar (_10); >> # DEBUG D#2 => (double) k_12(D) >> # DEBUG D#1 => D#3 * D#2 >> # DEBUG t$n => (int) D#1 >> >> In the above, # DEBUG D#3 => _4 * _6 appears above the statement that >> defines _4. But even if I move the def of D#3 below the statement >> defining _4, it is not sufficient. Before reassociation, t$n refers to >> the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after >> reassociation it would refer to ( j_8(D) * s$n_13 * i_5(D) * >> k_12(D)). It seems the correct fix is to discard the debug temps whose >> RHS contains a variable that is involved in reassociation and then >> reconstruct it. Is that the expected fix here? > > The value of the DEBUG expression changes because the value > that _4 computes changes - that is a no-no that may not happen. > We cannot re-use _4 (without previously releasing it and allocating > it newly) with a new value. releasing _4 should have fixed up the > debug stmts. > > So - can you verify whether we are indeed just replacing the > RHS of _4 = _2 * a_3(D) to _9 * _2 without changing the SSA name of > that expression? I have confirmed that the SSA name of the LHS remains the same. The reassociation code simply calls gimple_assign_set_rhs2 to modify RHS2 and then calls update_stmt. - Easwaran > Another reason why re-using the same LHS for another value is > wrong is that annotated information like points-to sets, alignment > and soon value-range information will be wrong. > > Thanks, > Richard. > >> - Easwaran >> >>>> >>>> testsuite/ChangeLog: >>>> 2013-07-31 Easwaran Raman >>>> >>>> PR middle-end/57370 >>>> PR tree-optimization/57393 >>>> PR tree-optimization/58011 >>>> * gfortran.dg/reassoc_12.f90: New testcase. >>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >>>> >>>> >>>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman wrote: >>>>> Ping. >>>>> >>>>> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman >>>>> wrote: >>>>>> A newly generated statement in build_and_add_sum function of >>>>>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent >>>>>> statement. In one instance, it was assigned the wrong uid (of an >>>>>> earlier phi statement) which messed up the IR and caused the test >>>>>> program to hang. Bootstraps and no test regressions on x86_64/linux. >>>>>> Ok for trunk? >>>>>> >>>>>> Thanks, >>>>>> Easwaran >>>>>> >>>>>> >>>>>> 2013-06-27 Easwaran Raman >>
Fix PR middle-end/57393
There are two separate root causes for the problem reported in PR 57393. This patch attempts to fix both. First is due to newly created stmts that have the default UID of 0 which are compared with statements with valid UIDs leading to broken dependences. As discussed in an earlier thread, I check the UIDs before using them and ensure stmts have a valid UID. In the worst case, this reassigns UIDs for the entire BB containing the stmts in question. The second is due to debug stmts being out of sync with the IR after reassociation. I think the right fix is to create debug temps before actual reassociation to decouple them from the SSA variables involved in reassociation. This bootstraps in x86_64 and I am running the tests. Ok for trunk? Thanks, Easwaran 2013-09-16 Easwaran Raman PR middle-end/57393 * tree-ssa-reassoc.c (get_stmt_uid_with_default): Remove. (build_and_add_sum): Do not set UID of newly created statements. (ensure_valid_uid): New function, (find_insert_point): called here before UIDs are compared. (insert_stmt_after): Do not reset debug statements. (regenerate_debug_stmts): New function, (reassociate_bb): use here. testsuite/ChangeLog: 2013-09-16 Easwaran Raman PR middle-end/57393 * gcc.dg/tree-ssa/reassoc-32.c: New testcase. * gcc.dg/tree-ssa/reassoc-33.c: New testcase. * gcc.dg/tree-ssa/reassoc-34.c: New testcase. pr57393.patch Description: Binary data
Re: Fix PR middle-end/57393
Ping. On Mon, Sep 16, 2013 at 4:42 PM, Easwaran Raman wrote: > There are two separate root causes for the problem reported in PR > 57393. This patch attempts to fix both. > > First is due to newly created stmts that have the default UID of 0 > which are compared with statements with valid UIDs leading to broken > dependences. As discussed in an earlier thread, I check the UIDs > before using them and ensure stmts have a valid UID. In the worst > case, this reassigns UIDs for the entire BB containing the stmts in > question. > > The second is due to debug stmts being out of sync with the IR after > reassociation. I think the right fix is to create debug temps before > actual reassociation to decouple them from the SSA variables involved > in reassociation. > > This bootstraps in x86_64 and I am running the tests. Ok for trunk? > > Thanks, > Easwaran > > 2013-09-16 Easwaran Raman > > PR middle-end/57393 > * tree-ssa-reassoc.c (get_stmt_uid_with_default): Remove. > (build_and_add_sum): Do not set UID of newly created statements. > (ensure_valid_uid): New function, > (find_insert_point): called here before UIDs are compared. > (insert_stmt_after): Do not reset debug statements. > (regenerate_debug_stmts): New function, > (reassociate_bb): use here. > > testsuite/ChangeLog: > 2013-09-16 Easwaran Raman > > PR middle-end/57393 > * gcc.dg/tree-ssa/reassoc-32.c: New testcase. > * gcc.dg/tree-ssa/reassoc-33.c: New testcase. > * gcc.dg/tree-ssa/reassoc-34.c: New testcase.
[Google] Adjust comdat-sharing-probability param for -Os
This patch increases comdat-sharing-probability to 80 under -Os. This reduces the amount of inlining and helps internal benchmarks. Unfortunately, this causes slight regression on spec 2006. Ok for google branches if all tests pass? - Easwaran comdat_sharing.patch Description: Binary data
Re: [Google] Adjust comdat-sharing-probability param for -Os
On Fri, Sep 27, 2013 at 4:08 PM, Xinliang David Li wrote: > d.growth -= (info->size > * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY)) > + 50) / 100; > > What is the purpose of '50' here? Rounding after division by 100. > The patch is fine for Google branch. > > Other tunings to think about -- I think the sharing probability should > not be a fixed value -- but depending on the function's charateristics > -- such as size, number of callsites etc. For instance, for small leaf > comdat functions, the sharing probability will be small. > David > > > On Fri, Sep 27, 2013 at 2:57 PM, Easwaran Raman wrote: >> This patch increases comdat-sharing-probability to 80 under -Os. This >> reduces the amount of inlining and helps internal benchmarks. >> Unfortunately, this causes slight regression on spec 2006. Ok for >> google branches if all tests pass? >> >> - Easwaran
Propagate profile counts during switch expansion
Hi, This patch propagates the profile counts during RTL expansion. In many cases, there is no way to determine the exact count of an edge generated during the expansion. So this patch uses some simple heuristics to estimate the edge counts but ensures that the counts of the basic blocks corresponding to the cases are (nearly the) same as at the gimple level. Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for trunk? - Easwaran -- 2012-10-02 Easwaran Raman * cfgbuild.c (gen_probabilities_from_existing_counts): New function. (compute_outgoing_frequencies): If at least one successor of a BB has non-zero profile count, use the counts to compute probabilities. * expr.c (do_tablejump): Add a REG_BR_PROB note on the jump to default label. (try_tablejump): Add a parameter to specify the probability of jumping to the default label. * expr.h (try_tablejump): Add a new parameter. * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT. (do_jump_if_equal): Pass probability for REG_BR_PROB note. (add_case_node): Pass execution count of the case node and use it to initialize COUNT field. (emit_case_decision_tree): Pass default_count to emit_case_nodes. (get_outgoing_edge_counts): New function. (add_prob_note_to_last_insn): Likewise. (case_probability): New macro. (emit_case_dispatch_table): Compute probability of jumping to default label and apply note to the jump. (expand_case): Compute and propagate default edge count to emit_case_dispatch_table. (expand_sjlj_dispatch_table): Update calls to add_case_node and emit_case_dispatch_table. (balance_case_nodes): Update subtree_counts. (emit_case_nodes): Compute edge probabilities and add note. gcc/testsuite/ChangeLog: 2012-10-02 Easwaran Raman * gcc.dg/tree-prof/switch-case-1.c: New test case. * gcc.dg/tree-prof/switch-case-2.c: New test case. Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c === --- gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-options "-O2 -fdump-rtl-expand-all" } */ +int g; + +__attribute__((noinline)) void foo (int n) +{ + switch (n) +{ +case 1: + g++; break; +case 2: + g += 2; break; +case 3: + g += 1; break; +case 4: + g += 3; break; +case 5: + g += 4; break; +case 6: + g += 5; break; +case 7: + g += 6; break; +case 8: + g += 7; break; +case 9: + g += 8; break; +default: + g += 8; break; + } +} + +int main () +{ + int i; + for (i = 0; i < 1; i++) + foo ((i * i) % 5); + return 0; +} +/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count 4000" 2 "expand"} } */ +/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count 2000" 1 "expand"} } */ +/* { dg-final-use { cleanup-rtl-dump "expand" } } */ Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c === --- gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-options "-O2 -fdump-rtl-expand-all" } */ +int g; + +__attribute__((noinline)) void foo (int n) +{ + switch (n) +{ +case 99: + g += 2; break; +case 1: + g++; break; +case 100: + g += 1; break; +case 4: + g += 3; break; +case 5: + g += 4; break; +case 6: + g += 5; break; +case 7: + g += 6; break; +case 8: + g += 7; break; +case 9: + g += 8; break; +default: + g += 8; break; + } +} + +int main () +{ + int i; + for (i = 0; i < 1; i++) + foo ((i * i) % 5); + return 0; +} +/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count 4000" 2 "expand"} } */ +/* { dg-final-use { scan-rtl-dump-times ";; basic block\[^\\n\]*count 2000" 1 "expand"} } */ +/* { dg-final-use { cleanup-rtl-dump "expand" } } */ Index: gcc/expr.c === --- gcc/expr.c (revision 191879) +++ gcc/expr.c (working copy) @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) { rtx temp, vector; @@ -10910,9 +
Re: Propagate profile counts during switch expansion
Hi Honza, I am addressing some of the questions you raise here. Will send an updated patch later. On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka wrote: > > @@ -560,7 +577,6 @@ compute_outgoing_frequencies (basic_block b) > >return; > > } > > } > > - > >if (single_succ_p (b)) > > { > >e = single_succ_edge (b); > > @@ -568,7 +584,10 @@ compute_outgoing_frequencies (basic_block b) > >e->count = b->count; > >return; > > } > > - guess_outgoing_edge_probabilities (b); > > + else if (!gen_probabilities_from_existing_counts (b->succs)){ > > +/* All outgoing edges of B have zero count. Guess probabilities. */ > > +guess_outgoing_edge_probabilities (b); > > + } > > Hmm, I do not quite follow logic here. > basic block B is one of many basic blocks that the original BB was split from. > It is possible that B may have some of original edges, but there may be new > ones. > How you can guess the outgoing probabilitie shere. Do you have an example? The code reaches here only when the BB has more than two successor. I assume that this happens only when the BB is terminated by a switch statement. During conversion to RTL, the outgoing edges of this BB and their counts ar untouched. So I am just using those counts to compute the edge probabilities. > Also gen_probabilities_from_existing_counts could probably also work based > on original edge frequencies. Just to be clear I understand it right, you want me to use the frequency instead of count since the frequency is derived from the counts in profile use builds? > > > + > > + default_edge->count = default_count; > > + if (count) > > +{ > > + edge e; > > + edge_iterator ei; > > + FOR_EACH_EDGE (e, ei, stmt_bb->succs) > > +e->probability = e->count * REG_BR_PROB_BASE / count; > > +} > > Hmm, this updates origina BB containing the switch statement? The out_edges of the original BB. > Of course, > modulo roundoff errors, this should hold. I wonder where did you got the > diferences and why do you need this? Originally, I obtained e->probability as e->count * REG_BR_PROB_BASE / bb->count. During profile bootstrap, I noticed BBs whose sum of outgoing edge counts exceeded the BB's count and hence I normalized with the sum of edge counts. > You are going to output new control flow > and find_many_sub_basic_blocks will recompute all the counts/frequencies > inside > anyway? Sorry, I am lost here. I am just updating the probability of stmt_bb->succs right? > > > @@ -2430,7 +2523,11 @@ emit_case_nodes (rtx index, case_node_ptr node, rt > > unsignedp), > > GT, NULL_RTX, mode, unsignedp, > > label_rtx (node->right->code_label)); > > - emit_case_nodes (index, node->left, default_label, index_type); > > + probability = case_probability (node->right->count, > > + subtree_count + > > default_count); > > + add_prob_note_to_last_insn (probability); > > + emit_case_nodes (index, node->left, default_label, default_count, > > + index_type); > > Hmm, here you seem to be distributing the probabilities of the counts reached. > What happens in the case when the edge probability needs to be distributed > across > nodes of the decision tree. I.e. in > t(int a) > { > switch (a) >{ > case 100: > case 200: > case 300: > case 400: > case 500: > case 600: > case 700: > case 800: > case 900: >t(); > case 101: > case 202: > case 303: > case 404: > case 505: > case 606: > case 707: > case 808: > case 909: >q(); >} > } > Ok, that's a bug in my patch. In the above example, there are two outgoing edges from the BB containing switch but many case nodes. I would end up multiplying the counts by the number of cases that lead to a label. If I divide the counts of each case_node by the number of case nodes that jump to the same label, will that be reasonable? In the above case, if t() is called 900 times, I will assume that each of the 9 cases contribute a count of 100. Thanks, Easwaran
Re: Propagate profile counts during switch expansion
> >> >> > + >> > + default_edge->count = default_count; >> > + if (count) >> > +{ >> > + edge e; >> > + edge_iterator ei; >> > + FOR_EACH_EDGE (e, ei, stmt_bb->succs) >> > +e->probability = e->count * REG_BR_PROB_BASE / count; >> > +} >> >> Hmm, this updates origina BB containing the switch statement? > The out_edges of the original BB. >> Of course, >> modulo roundoff errors, this should hold. I wonder where did you got the >> diferences and why do you need this? The count of the outgoing edge of BB that jumps to the default label needs to be updated (to 0 or original_default_count/2, depending on whether default label is a jump table target). So I need to redistribute the probabilities of the rest of the edges. That's why I do this here. > Originally, I obtained e->probability as e->count * REG_BR_PROB_BASE / > bb->count. During profile bootstrap, I noticed BBs whose sum of > outgoing edge counts exceeded the BB's count and hence I normalized > with the sum of edge counts. > >> You are going to output new control flow >> and find_many_sub_basic_blocks will recompute all the counts/frequencies >> inside >> anyway? > Sorry, I am lost here. I am just updating the probability of > stmt_bb->succs right? > >
Move statements upwards after reassociation
Hi, In the expression reassociation pass, statements might get moved downwards to ensure that dependences are not violated after reassociation. This can increase the live range and, in a tight loop, result in spills. This patch simply does a code motion of those statements involved in reassociation and pushes them upwards in the BB as far as possible without violating dependences. Bootstraps and no tests regressions on a x86_64 machine running linux. Ok for trunk? - Easwaran --- 2012-10-10 Easwaran Raman * tree-ssa-reassoc.c (move_stmt_upwards): New function. (rewrite_expr_tree): Record statements to be moved. (reassociate_bb): Move statements affected by reassociation as early as possible. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 191879) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,13 +2250,51 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Move STMT up within its BB until it can not be moved any further. */ + +static void move_stmt_upwards (gimple stmt) +{ + gimple_stmt_iterator gsi, gsistmt; + tree rhs1, rhs2; + gimple rhs1def = NULL, rhs2def = NULL; + rhs1 = gimple_assign_rhs1 (stmt); + rhs2 = gimple_assign_rhs2 (stmt); + gcc_assert (rhs1); + if (TREE_CODE (rhs1) == SSA_NAME) +rhs1def = SSA_NAME_DEF_STMT (rhs1); + else if (TREE_CODE (rhs1) != REAL_CST + && TREE_CODE (rhs1) != INTEGER_CST) +return; + if (rhs2) +{ + if (TREE_CODE (rhs2) == SSA_NAME) +rhs2def = SSA_NAME_DEF_STMT (rhs2); + else if (TREE_CODE (rhs1) != REAL_CST + && TREE_CODE (rhs1) != INTEGER_CST) +return; +} + gsi = gsi_for_stmt (stmt); + gsistmt = gsi; + gsi_prev (&gsi); + for (; !gsi_end_p (gsi); gsi_prev (&gsi)) +{ + gimple curr_stmt = gsi_stmt (gsi); + if (curr_stmt == rhs1def || curr_stmt == rhs2def) { +gsi_move_after (&gsistmt, &gsi); +return; + } +} + +} + /* Recursively rewrite our linearized statements so that the operators match those in OPS[OPINDEX], putting the computation in rank order. */ static void rewrite_expr_tree (gimple stmt, unsigned int opindex, - VEC(operand_entry_t, heap) * ops, bool moved) + VEC(operand_entry_t, heap) * ops, bool moved, + VEC(gimple, heap) **stmts_to_move) { tree rhs1 = gimple_assign_rhs1 (stmt); tree rhs2 = gimple_assign_rhs2 (stmt); @@ -2299,6 +2337,7 @@ rewrite_expr_tree (gimple stmt, unsigned int opind print_gimple_stmt (dump_file, stmt, 0, 0); } } + VEC_safe_push (gimple, heap, *stmts_to_move, stmt); return; } @@ -2346,7 +2385,9 @@ rewrite_expr_tree (gimple stmt, unsigned int opind } /* Recurse on the LHS of the binary operator, which is guaranteed to be the non-leaf side. */ - rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved); + rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved, + stmts_to_move); + VEC_safe_push (gimple, heap, *stmts_to_move, stmt); } /* Find out how many cycles we need to compute statements chain. @@ -3427,6 +3468,9 @@ reassociate_bb (basic_block bb) { gimple_stmt_iterator gsi; basic_block son; + VEC(gimple, heap) *stmts_to_move = NULL; + gimple stmt; + int i; for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) { @@ -3542,7 +3586,7 @@ reassociate_bb (basic_block bb) && VEC_length (operand_entry_t, ops) > 3) rewrite_expr_tree_parallel (stmt, width, ops); else -rewrite_expr_tree (stmt, 0, ops, false); +rewrite_expr_tree (stmt, 0, ops, false, &stmts_to_move); /* If we combined some repeated factors into a __builtin_powi call, multiply that result by the @@ -3560,6 +3604,7 @@ reassociate_bb (basic_block bb) target_ssa); gimple_set_location (mul_stmt, gimple_location (stmt)); gsi_insert_after (&gsi, mul_stmt, GSI_NEW_STMT); + VEC_safe_push (gimple, heap, stmts_to_move, mul_stmt); } } @@ -3567,6 +3612,11 @@ reassociate_bb (basic_block bb) } } } + + FOR_EACH_VEC_ELT (gimple, stmts_to_move, i, stmt) +move_stmt_upwards (stmt); + VEC_free (gimple, heap, stmts_to_move); + for (son = first_dom_son (CDI_POST_DOMINATORS, bb); son; son = next_dom_son (CDI_POST_DOMINATORS, son))
[google] Move delete with size to its own file (issue6655052)
This patch moves the two argument delete operator into its own file. When a program provides its own definition of operator delete (void *), but not operator delete (void *, size_t), we could end up linking two files that define _ZdlPv resulting in a linker error. Bootstraps with no test regressions on a x86_64 machine running linux. Ok for google/4_7 and google/main branches? Google ref b/6982747 2012-10-11 Easwaran Raman * libsupc++/Makefile.am: Add del_opsz.cc to sources. * libsupc++/Makefile.in: Regenerated. * libsupc++/del_opsz.cc: New file. * libsupc++/del_op.cc(operator delete(void*,std::size_t)): Remove. Index: libstdc++-v3/libsupc++/Makefile.in === --- libstdc++-v3/libsupc++/Makefile.in (revision 192373) +++ libstdc++-v3/libsupc++/Makefile.in (working copy) @@ -92,8 +92,8 @@ LTLIBRARIES = $(noinst_LTLIBRARIES) $(toolexeclib_ libsupc___la_LIBADD = am__objects_1 = array_type_info.lo atexit_arm.lo bad_alloc.lo \ bad_cast.lo bad_typeid.lo class_type_info.lo del_op.lo \ - del_opnt.lo del_opv.lo del_opvnt.lo dyncast.lo eh_alloc.lo \ - eh_arm.lo eh_aux_runtime.lo eh_call.lo eh_catch.lo \ + del_opsz.lo del_opnt.lo del_opv.lo del_opvnt.lo dyncast.lo \ + eh_alloc.lo eh_arm.lo eh_aux_runtime.lo eh_call.lo eh_catch.lo \ eh_exception.lo eh_globals.lo eh_personality.lo eh_ptr.lo \ eh_term_handler.lo eh_terminate.lo eh_tm.lo eh_throw.lo \ eh_type.lo eh_unex_handler.lo enum_type_info.lo \ @@ -362,6 +362,7 @@ sources = \ bad_typeid.cc \ class_type_info.cc \ del_op.cc \ + del_opsz.cc \ del_opnt.cc \ del_opv.cc \ del_opvnt.cc \ Index: libstdc++-v3/libsupc++/del_op.cc === --- libstdc++-v3/libsupc++/del_op.cc(revision 192373) +++ libstdc++-v3/libsupc++/del_op.cc(working copy) @@ -47,11 +47,3 @@ operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT if (ptr) std::free(ptr); } - -_GLIBCXX_WEAK_DEFINITION void -operator delete(void* ptr, -std::size_t bytes __attribute__((__unused__))) throw () -{ - if (ptr) -std::free(ptr); -} Index: libstdc++-v3/libsupc++/del_opsz.cc === --- libstdc++-v3/libsupc++/del_opsz.cc (revision 0) +++ libstdc++-v3/libsupc++/del_opsz.cc (revision 0) @@ -0,0 +1,50 @@ +// Boilerplate support routines for -*- C++ -*- dynamic memory management. + +// Copyright (C) 2012 +// Free Software Foundation +// +// This file is part of GCC. +// +// GCC is free software; you can redistribute it and/or modify +// it under the terms of the GNU General Public License as published by +// the Free Software Foundation; either version 3, or (at your option) +// any later version. +// +// GCC is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. +// +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#include + +#if !_GLIBCXX_HOSTED +// A freestanding C runtime may not provide "free" -- but there is no +// other reasonable way to implement "operator delete". +namespace std +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + extern "C" void free(void*); +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace +#else +# include +#endif + +#include "new" + +_GLIBCXX_WEAK_DEFINITION void +operator delete(void* ptr, +std::size_t bytes __attribute__((__unused__))) throw () +{ + if (ptr) +std::free(ptr); +} Index: libstdc++-v3/libsupc++/Makefile.am === --- libstdc++-v3/libsupc++/Makefile.am (revision 192373) +++ libstdc++-v3/libsupc++/Makefile.am (working copy) @@ -53,6 +53,7 @@ sources = \ bad_typeid.cc \ class_type_info.cc \ del_op.cc \ + del_opsz.cc \ del_opnt.cc \ del_opv.cc \ del_opvnt.cc \ -- This patch is available for review at http://codereview.appspot.com/6655052
Re: Move statements upwards after reassociation
Thanks for the comments. As David wrote, the intent of the patch is not to do a general purpose scheduling, but to compensate for the possible live range lengthening introduced by reassociation. On Thu, Oct 11, 2012 at 6:16 AM, Richard Biener wrote: > On Thu, Oct 11, 2012 at 3:52 AM, Easwaran Raman wrote: >> >> +/* Move STMT up within its BB until it can not be moved any further. */ >> + >> +static void move_stmt_upwards (gimple stmt) >> +{ >> + gimple_stmt_iterator gsi, gsistmt; >> + tree rhs1, rhs2; >> + gimple rhs1def = NULL, rhs2def = NULL; >> + rhs1 = gimple_assign_rhs1 (stmt); >> + rhs2 = gimple_assign_rhs2 (stmt); >> + gcc_assert (rhs1); > > Please no such senseless asserts. The following line will segfault anyway > if rhs1 is NULL and with a debugger an assert doesn't add any useful > information. Ok. > >> + if (TREE_CODE (rhs1) == SSA_NAME) >> +rhs1def = SSA_NAME_DEF_STMT (rhs1); >> + else if (TREE_CODE (rhs1) != REAL_CST >> + && TREE_CODE (rhs1) != INTEGER_CST) >> +return; >> + if (rhs2) > > You may not access gimple_assign_rhs2 if it is not there. So you have > to check whether you have an unary, binary or ternary (yes) operation. gimple_assign_rhs2 returns NULL_TREE if it the RHS of an assignment has less than two operands. Regarding the check for ternary operation, I believe it is not necessary. A statement is considered for reassociation only if the RHS class is GIMPLE_BINARY_RHS. Subsequently, for rhs1 and rhs2, it checks if the def statements also have the same code and so it seems to me that a statement with a ternary operator in the RHS will never be considered in rewrite_expr_tree. > >> +{ >> + if (TREE_CODE (rhs2) == SSA_NAME) >> +rhs2def = SSA_NAME_DEF_STMT (rhs2); >> + else if (TREE_CODE (rhs1) != REAL_CST >> + && TREE_CODE (rhs1) != INTEGER_CST) >> +return; >> +} >> + gsi = gsi_for_stmt (stmt); >> + gsistmt = gsi; >> + gsi_prev (&gsi); >> + for (; !gsi_end_p (gsi); gsi_prev (&gsi)) > > > > This doesn't make much sense. You can move a stmt to the nearest > common post-dominator. Assuming you only want to handle placing > it after rhs1def or after rhs2def(?) you don't need any loop, just > two dominator queries and an insertion after one of the definition > stmts. Within a BB isn't that still O(size of BB)? > But this code should consider BBs. For reassociation to look across BBs, the code should look something like this: L1 : a_2 = a_1 + 10 jc L3 L2: a_3 = a_2 + 20 - L1 should dominate L2 (otherwise there will be a phi node at L2 and the reassociation of a_3 will not consider the definition of a_2) - There are no other uses of a_2 other than the one in L3. After reassociation, the stmt defining a_2 would be moved to L2. In that case, the downward code motion of a_2 = a_1 + 10 to L2 is beneficial (one less instruction if the branch is taken). It is not obvious to me that moving it to L1 (or whereever a_1 is defined) is beneficial. > And I don't see why more optimal > placement cannot be done during rewrite_expr_tree itself. I started with that idea, but my current approach looks more simpler. Thanks, Easwaran > > NB, the whole reassoc code needs a re-write to avoid the excessive > stmt modifications when it does nothing. So I'd very much rather avoid > adding anything to reassoc until that rewrite happened. > > Richard. > >> +{ >> + gimple curr_stmt = gsi_stmt (gsi); >> + if (curr_stmt == rhs1def || curr_stmt == rhs2def) { >> +gsi_move_after (&gsistmt, &gsi); >> +return; >> + } >> +} >> + >> +} >> + >> /* Recursively rewrite our linearized statements so that the operators >> match those in OPS[OPINDEX], putting the computation in rank >> order. */ >> >> static void >> rewrite_expr_tree (gimple stmt, unsigned int opindex, >> - VEC(operand_entry_t, heap) * ops, bool moved) >> + VEC(operand_entry_t, heap) * ops, bool moved, >> + VEC(gimple, heap) **stmts_to_move) >> { >>tree rhs1 = gimple_assign_rhs1 (stmt); >>tree rhs2 = gimple_assign_rhs2 (stmt); >> @@ -2299,6 +2337,7 @@ rewrite_expr_tree (gimple stmt, unsigned int opind >>print_gimple_stmt (dump_file, stmt, 0, 0); >> } >> } >> + VEC_safe_push (gimple, heap, *stmts_to_move, stmt); >>return; >> } >> >> @@ -2346,7 +2385,9 @@ rewrite_expr_tree (gimple stmt, unsigned int opind >> } >>/* Recurse on t
Re: Move statements upwards after reassociation
On Fri, Oct 12, 2012 at 1:45 AM, Richard Biener wrote: > On Fri, Oct 12, 2012 at 3:09 AM, Easwaran Raman wrote: >> Thanks for the comments. As David wrote, the intent of the patch is >> not to do a general purpose scheduling, but to compensate for the >> possible live range lengthening introduced by reassociation. >> >> >> On Thu, Oct 11, 2012 at 6:16 AM, Richard Biener >> wrote: >>> On Thu, Oct 11, 2012 at 3:52 AM, Easwaran Raman wrote: >>>> >>>> +/* Move STMT up within its BB until it can not be moved any further. */ >>>> + >>>> +static void move_stmt_upwards (gimple stmt) >>>> +{ >>>> + gimple_stmt_iterator gsi, gsistmt; >>>> + tree rhs1, rhs2; >>>> + gimple rhs1def = NULL, rhs2def = NULL; >>>> + rhs1 = gimple_assign_rhs1 (stmt); >>>> + rhs2 = gimple_assign_rhs2 (stmt); >>>> + gcc_assert (rhs1); >>> >>> Please no such senseless asserts. The following line will segfault anyway >>> if rhs1 is NULL and with a debugger an assert doesn't add any useful >>> information. >> Ok. >>> >>>> + if (TREE_CODE (rhs1) == SSA_NAME) >>>> +rhs1def = SSA_NAME_DEF_STMT (rhs1); >>>> + else if (TREE_CODE (rhs1) != REAL_CST >>>> + && TREE_CODE (rhs1) != INTEGER_CST) >>>> +return; >>>> + if (rhs2) >>> >>> You may not access gimple_assign_rhs2 if it is not there. So you have >>> to check whether you have an unary, binary or ternary (yes) operation. >> >> gimple_assign_rhs2 returns NULL_TREE if it the RHS of an assignment >> has less than two operands. Regarding the check for ternary >> operation, I believe it is not necessary. A statement is considered >> for reassociation only if the RHS class is GIMPLE_BINARY_RHS. >> Subsequently, for rhs1 and rhs2, it checks if the def statements also >> have the same code and so it seems to me that a statement with a >> ternary operator in the RHS will never be considered in >> rewrite_expr_tree. >> >> >>> >>>> +{ >>>> + if (TREE_CODE (rhs2) == SSA_NAME) >>>> +rhs2def = SSA_NAME_DEF_STMT (rhs2); >>>> + else if (TREE_CODE (rhs1) != REAL_CST >>>> + && TREE_CODE (rhs1) != INTEGER_CST) >>>> +return; >>>> +} >>>> + gsi = gsi_for_stmt (stmt); >>>> + gsistmt = gsi; >>>> + gsi_prev (&gsi); >>>> + for (; !gsi_end_p (gsi); gsi_prev (&gsi)) >>> >>> >>> >>> This doesn't make much sense. You can move a stmt to the nearest >>> common post-dominator. Assuming you only want to handle placing >>> it after rhs1def or after rhs2def(?) you don't need any loop, just >>> two dominator queries and an insertion after one of the definition >>> stmts. >> >> Within a BB isn't that still O(size of BB)? > > Please document the fact that both stmts are in the same BB. Ok. > And no, it isn't, it is O (size of BB ^ 2). You don't need a loop. > operand rank should reflect the dominance relation inside the BB. The rank of phi definitions would mess this up. > If that doesn't work simply assign UIDs to the stmts first. Ok. > >>> But this code should consider BBs. >> For reassociation to look across BBs, the code should look something like >> this: >> >> L1 : >>a_2 = a_1 + 10 >>jc L3 >> L2: >> a_3 = a_2 + 20 >> >> - L1 should dominate L2 (otherwise there will be a phi node at L2 and >> the reassociation of a_3 will not consider the definition of a_2) >> - There are no other uses of a_2 other than the one in L3. >> >> After reassociation, the stmt defining a_2 would be moved to L2. In >> that case, the downward code motion of a_2 = a_1 + 10 to L2 is >> beneficial (one less instruction if the branch is taken). It is not >> obvious to me that moving it to L1 (or whereever a_1 is defined) is >> beneficial. > > In this case it doesn't matter whether a1 lives through a3 or if a2 does. > But moving the stmt is not necessary, so why not simply avoid it. I used that example to show that the current downward motion has a useful side effect and this patch preserves it. Yes, in this example the downward motion can be avoided but in general it may not be possible. I agree with you that there is unnecessary code motion in many cases. > You cannot undo it with yout patch anyway. > >>> And I do
Re: Propagate profile counts during switch expansion
Ping. On Mon, Oct 8, 2012 at 5:46 PM, Easwaran Raman wrote: > I have attached a revised patch. The updated ChangeLog is given below > and I have responded to your comments inline: > > 2012-10-08 Easwaran Raman > * optabs.c (emit_cmp_and_jump_insn_1): Add a new parameter to > specificy the probability of taking the jump. > (emit_cmp_and_jump_insns): Likewise. > (expand_compare_and_swap_loop): Make the jump predicted not taken. > * dojump.c (do_compare_rtx_and_jump): Remove the code attaching > REG_BR_PROB note and pass probability to emit_cmp_and_jump_insns. > * cfgbuild.c (compute_outgoing_frequencies): Do not guess outgoing > probabilities for branches with more than two successors. > * expr.c (emit_block_move_via_loop): Predict the loop backedge loop > to be highly taken. > (try_casesi): Pass the probability of jumping to the default label. > (try_tablejump): Likewise. > (do_tablejump): Likewise. > * expr.h (try_tablejump): Add a new parameter. > (try_casesi): Likewise. > (emit_cmp_and_jump_insns): Add probability as default parameter with a > default value of -1. > * except.c (sjlj_emit_function_enter): Pass probability to > emit_cmp_and_jump_insns. > * stmt.c (case_node): Add new fields PROB and SUBTREE_PROB. > (do_jump_if_equal): Pass probability for REG_BR_PROB note. > (add_case_node): Pass estimated probability of jumping to the case > label. > (emit_case_decision_tree): Pass default_prob to emit_case_nodes. > (get_outgoing_edge_probs): New function. > (conditional_probability): Likewise. > (reset_out_edges_aux): Likewise. > (compute_cases_per_edge): Likewise. > (emit_case_dispatch_table): Update probabilities of edges coming out > of the switch statement. > (expand_case): Compute and propagate default edge probability to > emit_case_dispatch_table. > (expand_sjlj_dispatch_table): Update calls to add_case_node and > emit_case_dispatch_table. > (balance_case_nodes): Update subtree_prob values. > (emit_case_nodes): Compute edge probabilities and add pass them to > emit_cmp_and_jump_insns. > > gcc/testsuite/ChangeLog: > 2012-10-02 Easwaran Raman > * gcc.dg/tree-prof/switch-case-1.c: New test case. > * gcc.dg/tree-prof/switch-case-2.c: New test case. > > > > > On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka wrote: >>> Hi, >>> This patch propagates the profile counts during RTL expansion. In >>> many cases, there is no way to determine the exact count of an edge >>> generated during the expansion. So this patch uses some simple >>> heuristics to estimate the edge counts but ensures that the counts of >>> the basic blocks corresponding to the cases are (nearly the) same as >>> at the gimple level. >>> >>> Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for >>> trunk? >>> Index: gcc/expr.c >>> === >>> --- gcc/expr.c (revision 191879) >>> +++ gcc/expr.c (working copy) >>> @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin >>> #ifdef PUSH_ROUNDING >>> static void emit_single_push_insn (enum machine_mode, rtx, tree); >>> #endif >>> -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); >>> +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); >>> static rtx const_vector_from_tree (tree); >>> static void write_complex_part (rtx, rtx, bool); >>> >>> @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree >>> >>> static void >>> do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx >>> table_label, >>> - rtx default_label) >>> + rtx default_label, int default_probability) >> >> Please document default_probability. > Done. > >>> { >>>rtx temp, vector; >>> >>> @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r >>> the maximum value of the range. */ >>> >>>if (default_label) >>> -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, >>> - default_label); >>> +{ >>> + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, >>> + default_label); >>> + if (default_probability != -1) >>> +{ >>> + rtx jump_insn = get_last_insn(); >>> + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT >>> (default_probability)); >>> +} >>> +} >> >> dojump already does this kind of logic, but it is
Re: Propagate profile counts during switch expansion
On Sun, Oct 14, 2012 at 8:09 AM, Jan Hubicka wrote: > Hi, > > Index: optabs.c > === > --- optabs.c(revision 191879) > +++ optabs.c(working copy) > @@ -4249,7 +4249,7 @@ prepare_operand (enum insn_code icode, rtx x, int > we can do the branch. */ > > static void > -emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label) > +emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label, int > prob) > { >enum machine_mode optab_mode; >enum mode_class mclass; > @@ -4261,7 +4261,16 @@ static void > >gcc_assert (icode != CODE_FOR_nothing); >gcc_assert (insn_operand_matches (icode, 0, test)); > - emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), > label)); > + rtx insn = emit_insn ( > + GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), label)); > > I think we did not change to style of mixing declaration and code yet. So > please put declaration ahead. Ok. > > I think you want to keep emit_jump_insn. Also do nothing when profile_status > == PROFILE_ABSENT. Why should this be dependent on profile_status? The PROB passed could also come from static prediction right. > Index: cfgbuild.c > === > --- cfgbuild.c (revision 191879) > +++ cfgbuild.c (working copy) > @@ -559,8 +559,11 @@ compute_outgoing_frequencies (basic_block b) > f->count = b->count - e->count; > return; > } > + else > +{ > + guess_outgoing_edge_probabilities (b); > +} > > Add comment here that we rely on multiway BBs having sane probabilities > already. > You still want to do guessing when the edges out are EH. Those also can be > many. I think this should work: - if (single_succ_p (b)) + else if (single_succ_p (b)) { e = single_succ_edge (b); e->probability = REG_BR_PROB_BASE; e->count = b->count; return; } - guess_outgoing_edge_probabilities (b); + else +{ + /* We rely on BBs with more than two successors to have sane probabilities + and do not guess them here. For BBs terminated by switch statements + expanded to jump-table jump, we have done the right thing during + expansion. For EH edges, we still guess the probabilities here. */ + bool complex_edge = false; + FOR_EACH_EDGE (e, ei, b->succs) +if (e->flags & EDGE_COMPLEX) + { +complex_edge = true; +break; + } + if (complex_edge) +guess_outgoing_edge_probabilities (b); +} + > Index: expr.h > === > --- expr.h (revision 191879) > +++ expr.h (working copy) > @@ -190,7 +190,7 @@ extern int have_sub2_insn (rtx, rtx); > /* Emit a pair of rtl insns to compare two rtx's and to jump > to a label if the comparison is true. */ > extern void emit_cmp_and_jump_insns (rtx, rtx, enum rtx_code, rtx, > -enum machine_mode, int, rtx); > +enum machine_mode, int, rtx, int > prob=-1); > > Hmm, probably first appreance of this C++ construct. I suppose it is OK. http://gcc.gnu.org/codingconventions.html#Default says it is ok for POD values. > > +static inline void > +reset_out_edges_aux (basic_block bb) > +{ > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE(e, ei, bb->succs) > +e->aux = (void *)0; > +} > +static inline void > +compute_cases_per_edge (gimple stmt) > +{ > + basic_block bb = gimple_bb (stmt); > + reset_out_edges_aux (bb); > + int ncases = gimple_switch_num_labels (stmt); > + for (int i = ncases - 1; i >= 1; --i) > +{ > + tree elt = gimple_switch_label (stmt, i); > + tree lab = CASE_LABEL (elt); > + basic_block case_bb = label_to_block_fn (cfun, lab); > + edge case_edge = find_edge (bb, case_bb); > + case_edge->aux = (void *)((long)(case_edge->aux) + 1); > +} > +} > > Comments and newlines per coding standard. Ok. > With the these changes, the patch is OK Thanks, Easwaran > > Thanks, > Honza
Fix bugs introduced by switch-case profile propagation
Hi, This patch fixes bugs introduced by my previous patch to propagate profiles during switch expansion. Bootstrap and profiledbootstrap successful on x86_64. Confirmed that it fixes the crashes reported in PR middle-end/54957. OK for trunk? - Easwaran 2012-10-17 Easwaran Raman PR target/54938 PR middle-end/54957 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note only if it doesn't already exist. * except.c (sjlj_emit_function_enter): Remove unused variable. * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. (emit_case_dispatch_table): Handle the case where STMT_BB is NULL. (expand_sjlj_dispatch_table): Pass BB containing before_case to emit_case_dispatch_table. Index: gcc/optabs.c === --- gcc/optabs.c (revision 192488) +++ gcc/optabs.c (working copy) @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m && profile_status != PROFILE_ABSENT && insn && JUMP_P (insn) - && any_condjump_p (insn)) -{ - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); -} + && any_condjump_p (insn) + && !find_reg_note (insn, REG_BR_PROB, 0)) +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); } /* Generate code to compare X with Y so that the condition codes are Index: gcc/except.c === --- gcc/except.c (revision 192488) +++ gcc/except.c (working copy) @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label) if (dispatch_label) { #ifdef DONT_USE_BUILTIN_SETJMP - rtx x, last; + rtx x; x = emit_library_call_value (setjmp_libfunc, NULL_RTX, LCT_RETURNS_TWICE, TYPE_MODE (integer_type_node), 1, plus_constant (Pmode, XEXP (fc, 0), Index: gcc/stmt.c === --- gcc/stmt.c (revision 192488) +++ gcc/stmt.c (working copy) @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb) edge e; edge_iterator ei; int prob_sum = 0; + if (!bb) +return 0; FOR_EACH_EDGE(e, ei, bb->succs) prob_sum += e->probability; return prob_sum; @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in rtx fallback_label = label_rtx (case_list->code_label); rtx table_label = gen_label_rtx (); bool has_gaps = false; - edge default_edge = EDGE_SUCC(stmt_bb, 0); - int default_prob = default_edge->probability; + edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL; + int default_prob = default_edge ? default_edge->probability : 0; int base = get_outgoing_edge_probs (stmt_bb); bool try_with_tablejump = false; @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in default_prob = 0; } - default_edge->probability = default_prob; + if (default_edge) +default_edge->probability = default_prob; /* We have altered the probability of the default edge. So the probabilities of all other edges need to be adjusted so that it sums up to @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index, emit_case_dispatch_table (index_expr, index_type, case_list, default_label, - minval, maxval, range, NULL); + minval, maxval, range, +BLOCK_FOR_INSN (before_case)); emit_label (default_label); free_alloc_pool (case_node_pool); }
Minimize downward code motion during reassociation
Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe->op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe->op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} + +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i < len; i++) +{ + operand_entry_t oe = VEC_index (operand_entry_t, ops, i); + gimple def_stmt; + basic_block def_bb; + /* Ignore constants and operands with default definitons. */ + if (TREE_CODE (oe->op) != SSA_NAME + || SSA_NAME_IS_DEFAULT_DEF (oe->op)) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe->op); + def_bb = gimple_bb (def_stmt); + if (def_bb != insert_bb + && !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb)) +{ + insert_bb = def_bb; + insert_stmt = def_stmt; +} + else if (def_bb == insert_bb + && gimple_uid (insert_stmt) < gimple_uid (def_stmt)) +insert_stmt = def_stmt; +} + if (insert_stmt == stmt) +return; + gsistmt = gsi_for_stmt (stmt); + /* If GSI_STMT is a phi node, then do not insert just after that statement. + Instead, find the first non-label gimple statement in BB and insert before + that. */ + if (gimple_code (insert_stmt) == GIMPLE_PHI) +{ + gsi_insert = gsi_after_labels (insert_bb); + gsi_move_before (&gsistmt, &gsi_insert); +} + /* Statements marked for throw can not be in the middle of a basic block. So + we can not insert a statement (not marked for throw) immediately after. */ + else if (lookup_stmt_eh_lp (insert_stmt) > 0 + && stmt_can_throw_internal (insert_stmt)) +{ + edge e, succ_edge = NULL; + edge_iterator ei; + + /* There should be exactly one normal edge out of the basic block. */ + FOR_EACH_EDGE (e, ei, insert_bb->succs) +{ + if (!(e->flags & EDGE_COMPLEX)) +{ + gcc_assert (succ_edge == NULL); + succ_edge = e; +} +} + /* Insert STMT at the beginning of the successor basic block. */ + insert_bb = succ_edge->dest; + gsi_insert = gsi_after_labels (insert_bb); + gsi_move_before (&gsistmt, &gsi_insert); +} + else +{ + gsi_insert = gsi_for_stmt (insert_stmt); + gsi_move_after (&gsistmt, &gsi_insert); +} + +} + /* Recursively rewrite our linearized statements so that the operators ma
Re: Minimize downward code motion during reassociation
On Fri, Oct 19, 2012 at 5:13 PM, Xinliang David Li wrote: > On Thu, Oct 18, 2012 at 3:36 PM, Easwaran Raman wrote: >> Hi, >> >> During expression reassociation, statements are conservatively moved >> downwards to ensure that dependences are correctly satisfied after >> reassocation. This could lead to lengthening of live ranges. This >> patch moves statements only to the extent necessary. Bootstraps and no >> test regression on x86_64/linux. OK for trunk? >> >> Thanks, >> Easwaran >> >> 2012-10-18 Easwaran Raman >> * tree-ssa-reassoc.c(assign_uids): New function. >> (assign_uids_in_relevant_bbs): Likewise. >> (ensure_ops_are_available): Likewise. >> (rewrite_expr_tree): Do not move statements beyond what is >> necessary. Remove call to swap_ops_for_binary_stmt... >> (reassociate_bb): ... and move it here. >> >> >> >> Index: gcc/tree-ssa-reassoc.c >> === >> --- gcc/tree-ssa-reassoc.c (revision 192487) >> +++ gcc/tree-ssa-reassoc.c (working copy) >> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea >> } >> } >> >> +/* Assign UIDs to statements in basic block BB. */ >> + >> +static void >> +assign_uids (basic_block bb) >> +{ >> + unsigned uid = 0; >> + gimple_stmt_iterator gsi; >> + /* First assign uids to phis. */ >> + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> +{ >> + gimple stmt = gsi_stmt (gsi); >> + gimple_set_uid (stmt, uid++); >> +} >> + >> + /* Then assign uids to stmts. */ >> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> +{ >> + gimple stmt = gsi_stmt (gsi); >> + gimple_set_uid (stmt, uid++); >> +} >> +} >> + >> +/* For each operand in OPS, find the basic block that contains the statement >> + which defines the operand. For all such basic blocks, assign UIDs. */ >> + >> +static void >> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) >> +{ >> + operand_entry_t oe; >> + int i; >> + struct pointer_set_t *seen_bbs = pointer_set_create (); >> + >> + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) >> +{ >> + gimple def_stmt; >> + basic_block bb; >> + if (TREE_CODE (oe->op) != SSA_NAME) >> +continue; >> + def_stmt = SSA_NAME_DEF_STMT (oe->op); >> + bb = gimple_bb (def_stmt); >> + if (!pointer_set_contains (seen_bbs, bb)) >> +{ >> + assign_uids (bb); >> + pointer_set_insert (seen_bbs, bb); >> +} >> +} >> + pointer_set_destroy (seen_bbs); >> +} >> + >> +/* Ensure that operands in the OPS vector starting from OPINDEXth >> entry are live >> + at STMT. This is accomplished by moving STMT if needed. */ >> + >> +static void >> +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * >> ops, int opindex) > > format issue? > >> +{ >> + int i; >> + int len = VEC_length (operand_entry_t, ops); >> + gimple insert_stmt = stmt; >> + basic_block insert_bb = gimple_bb (stmt); >> + gimple_stmt_iterator gsi_insert, gsistmt; >> + for (i = opindex; i < len; i++) >> +{ >> + operand_entry_t oe = VEC_index (operand_entry_t, ops, i); >> + gimple def_stmt; >> + basic_block def_bb; >> + /* Ignore constants and operands with default definitons. */ >> + if (TREE_CODE (oe->op) != SSA_NAME >> + || SSA_NAME_IS_DEFAULT_DEF (oe->op)) >> +continue; >> + def_stmt = SSA_NAME_DEF_STMT (oe->op); >> + def_bb = gimple_bb (def_stmt); >> + if (def_bb != insert_bb >> + && !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb)) >> +{ >> + insert_bb = def_bb; >> + insert_stmt = def_stmt; >> +} >> + else if (def_bb == insert_bb >> + && gimple_uid (insert_stmt) < gimple_uid (def_stmt)) >> +insert_stmt = def_stmt; >> +} >> + if (insert_stmt == stmt) >> +return; >> + gsistmt = gsi_for_stmt (stmt); >> + /* If GSI_STMT is a phi node, then do not insert just after that >> statement. > > GSI_STMT --> INSERT_STMT? Ok. >> + Instead, find the first non-label gimple statement in BB and insert >> before >> + that.
Re: Minimize downward code motion during reassociation
On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener wrote: > On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman wrote: >> Hi, >> >> During expression reassociation, statements are conservatively moved >> downwards to ensure that dependences are correctly satisfied after >> reassocation. This could lead to lengthening of live ranges. This >> patch moves statements only to the extent necessary. Bootstraps and no >> test regression on x86_64/linux. OK for trunk? >> >> Thanks, >> Easwaran >> >> 2012-10-18 Easwaran Raman >> * tree-ssa-reassoc.c(assign_uids): New function. >> (assign_uids_in_relevant_bbs): Likewise. >> (ensure_ops_are_available): Likewise. >> (rewrite_expr_tree): Do not move statements beyond what is >> necessary. Remove call to swap_ops_for_binary_stmt... >> (reassociate_bb): ... and move it here. >> >> >> >> Index: gcc/tree-ssa-reassoc.c >> === >> --- gcc/tree-ssa-reassoc.c (revision 192487) >> +++ gcc/tree-ssa-reassoc.c (working copy) >> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea >> } >> } >> >> +/* Assign UIDs to statements in basic block BB. */ >> + >> +static void >> +assign_uids (basic_block bb) >> +{ >> + unsigned uid = 0; >> + gimple_stmt_iterator gsi; >> + /* First assign uids to phis. */ >> + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> +{ >> + gimple stmt = gsi_stmt (gsi); >> + gimple_set_uid (stmt, uid++); >> +} >> + >> + /* Then assign uids to stmts. */ >> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> +{ >> + gimple stmt = gsi_stmt (gsi); >> + gimple_set_uid (stmt, uid++); >> +} >> +} >> + >> +/* For each operand in OPS, find the basic block that contains the statement >> + which defines the operand. For all such basic blocks, assign UIDs. */ >> + >> +static void >> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) >> +{ >> + operand_entry_t oe; >> + int i; >> + struct pointer_set_t *seen_bbs = pointer_set_create (); >> + >> + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) >> +{ >> + gimple def_stmt; >> + basic_block bb; >> + if (TREE_CODE (oe->op) != SSA_NAME) >> +continue; >> + def_stmt = SSA_NAME_DEF_STMT (oe->op); >> + bb = gimple_bb (def_stmt); >> + if (!pointer_set_contains (seen_bbs, bb)) >> +{ >> + assign_uids (bb); >> + pointer_set_insert (seen_bbs, bb); >> +} >> +} >> + pointer_set_destroy (seen_bbs); >> +} > > Please assign UIDs once using the existing renumber_gimple_stmt_uids (). > You seem to call the above multiple times and thus do work bigger than > O(number of basic blocks). The reason I call the above multiple times is that gsi_move_before might get called between two calls to the above. For instance, after rewrite_expr_tree is called once, the following sequence of calls could happen: reassociate_bb -> linearize_expr_tree -> linearize_expr -> gsi_move_before. So it is not sufficient to call renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to use renumber_gimple_stmt_uids_in_blocks instead of assign_uids_in_relevant_bbs? >> +/* Ensure that operands in the OPS vector starting from OPINDEXth >> entry are live >> + at STMT. This is accomplished by moving STMT if needed. */ >> + >> +static void >> +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * >> ops, int opindex) >> +{ >> + int i; >> + int len = VEC_length (operand_entry_t, ops); >> + gimple insert_stmt = stmt; >> + basic_block insert_bb = gimple_bb (stmt); >> + gimple_stmt_iterator gsi_insert, gsistmt; >> + for (i = opindex; i < len; i++) >> +{ > > Likewise you call this for each call to rewrite_expr_tree, so it seems to me > this is quadratic in the number of ops in the op vector. The call to ensure_ops_are_available inside rewrite_expr_tree is guarded by if (!moved) and I am setting moved = true there to ensure that ensure_ops_are_available inside is called once per reassociation of a expression tree. > > Why make this all so complicated? It seems to me that we should > fixup stmt order only after the whole ops vector has been materialized. > >> + operand_entry_t oe = VEC_index (operand_entry_t, ops, i); >> + gim
Re: Fix bugs introduced by switch-case profile propagation
Ping. On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman wrote: > Hi, > This patch fixes bugs introduced by my previous patch to propagate > profiles during switch expansion. Bootstrap and profiledbootstrap > successful on x86_64. Confirmed that it fixes the crashes reported in > PR middle-end/54957. OK for trunk? > > - Easwaran > > 2012-10-17 Easwaran Raman > > PR target/54938 > PR middle-end/54957 > * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note > only if it doesn't already exist. > * except.c (sjlj_emit_function_enter): Remove unused variable. > * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. > (emit_case_dispatch_table): Handle the case where STMT_BB is > NULL. > (expand_sjlj_dispatch_table): Pass BB containing before_case > to emit_case_dispatch_table. > > Index: gcc/optabs.c > === > --- gcc/optabs.c (revision 192488) > +++ gcc/optabs.c (working copy) > @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m >&& profile_status != PROFILE_ABSENT >&& insn >&& JUMP_P (insn) > - && any_condjump_p (insn)) > -{ > - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); > - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); > -} > + && any_condjump_p (insn) > + && !find_reg_note (insn, REG_BR_PROB, 0)) > +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); > } > > /* Generate code to compare X with Y so that the condition codes are > Index: gcc/except.c > === > --- gcc/except.c (revision 192488) > +++ gcc/except.c (working copy) > @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label) >if (dispatch_label) > { > #ifdef DONT_USE_BUILTIN_SETJMP > - rtx x, last; > + rtx x; >x = emit_library_call_value (setjmp_libfunc, NULL_RTX, > LCT_RETURNS_TWICE, > TYPE_MODE (integer_type_node), 1, > plus_constant (Pmode, XEXP (fc, 0), > Index: gcc/stmt.c > === > --- gcc/stmt.c (revision 192488) > +++ gcc/stmt.c (working copy) > @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb) >edge e; >edge_iterator ei; >int prob_sum = 0; > + if (!bb) > +return 0; >FOR_EACH_EDGE(e, ei, bb->succs) > prob_sum += e->probability; >return prob_sum; > @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in >rtx fallback_label = label_rtx (case_list->code_label); >rtx table_label = gen_label_rtx (); >bool has_gaps = false; > - edge default_edge = EDGE_SUCC(stmt_bb, 0); > - int default_prob = default_edge->probability; > + edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL; > + int default_prob = default_edge ? default_edge->probability : 0; >int base = get_outgoing_edge_probs (stmt_bb); >bool try_with_tablejump = false; > > @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in >default_prob = 0; > } > > - default_edge->probability = default_prob; > + if (default_edge) > +default_edge->probability = default_prob; > >/* We have altered the probability of the default edge. So the > probabilities > of all other edges need to be adjusted so that it sums up to > @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index, > >emit_case_dispatch_table (index_expr, index_type, > case_list, default_label, > - minval, maxval, range, NULL); > + minval, maxval, range, > +BLOCK_FOR_INSN (before_case)); >emit_label (default_label); >free_alloc_pool (case_node_pool); > }
Re: Fix bugs introduced by switch-case profile propagation
On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka wrote: >> Ping. >> >> >> On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman wrote: >> > Hi, >> > This patch fixes bugs introduced by my previous patch to propagate >> > profiles during switch expansion. Bootstrap and profiledbootstrap >> > successful on x86_64. Confirmed that it fixes the crashes reported in >> > PR middle-end/54957. OK for trunk? >> > >> > - Easwaran >> > >> > 2012-10-17 Easwaran Raman >> > >> > PR target/54938 >> > PR middle-end/54957 >> > * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note >> > only if it doesn't already exist. >> > * except.c (sjlj_emit_function_enter): Remove unused variable. >> > * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. > > Seems fine, but under what conditions you get NULL here? When expand_sjlj_dispatch_table calls emit_case_dispatch_table, stmt_bb is NULL. - Easwaran > Honza >> > (emit_case_dispatch_table): Handle the case where STMT_BB is >> > NULL. >> > (expand_sjlj_dispatch_table): Pass BB containing before_case >> > to emit_case_dispatch_table. >> > >> > Index: gcc/optabs.c >> > === >> > --- gcc/optabs.c (revision 192488) >> > +++ gcc/optabs.c (working copy) >> > @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m >> >&& profile_status != PROFILE_ABSENT >> >&& insn >> >&& JUMP_P (insn) >> > - && any_condjump_p (insn)) >> > -{ >> > - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); >> > - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); >> > -} >> > + && any_condjump_p (insn) >> > + && !find_reg_note (insn, REG_BR_PROB, 0)) >> > +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); >> > } >> > >> > /* Generate code to compare X with Y so that the condition codes are >> > Index: gcc/except.c >> > === >> > --- gcc/except.c (revision 192488) >> > +++ gcc/except.c (working copy) >> > @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label) >> >if (dispatch_label) >> > { >> > #ifdef DONT_USE_BUILTIN_SETJMP >> > - rtx x, last; >> > + rtx x; >> >x = emit_library_call_value (setjmp_libfunc, NULL_RTX, >> > LCT_RETURNS_TWICE, >> > TYPE_MODE (integer_type_node), 1, >> > plus_constant (Pmode, XEXP (fc, 0), >> > Index: gcc/stmt.c >> > === >> > --- gcc/stmt.c (revision 192488) >> > +++ gcc/stmt.c (working copy) >> > @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb) >> >edge e; >> >edge_iterator ei; >> >int prob_sum = 0; >> > + if (!bb) >> > +return 0; >> >FOR_EACH_EDGE(e, ei, bb->succs) >> > prob_sum += e->probability; >> >return prob_sum; >> > @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in >> >rtx fallback_label = label_rtx (case_list->code_label); >> >rtx table_label = gen_label_rtx (); >> >bool has_gaps = false; >> > - edge default_edge = EDGE_SUCC(stmt_bb, 0); >> > - int default_prob = default_edge->probability; >> > + edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL; >> > + int default_prob = default_edge ? default_edge->probability : 0; >> >int base = get_outgoing_edge_probs (stmt_bb); >> >bool try_with_tablejump = false; >> > >> > @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in >> >default_prob = 0; >> > } >> > >> > - default_edge->probability = default_prob; >> > + if (default_edge) >> > +default_edge->probability = default_prob; >> > >> >/* We have altered the probability of the default edge. So the >> > probabilities >> > of all other edges need to be adjusted so that it sums up to >> > @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index, >> > >> >emit_case_dispatch_table (index_expr, index_type, >> > case_list, default_label, >> > - minval, maxval, range, NULL); >> > + minval, maxval, range, >> > +BLOCK_FOR_INSN (before_case)); >> >emit_label (default_label); >> >free_alloc_pool (case_node_pool); >> > }
Re: Minimize downward code motion during reassociation
On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener wrote: > On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman wrote: >> On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener >> wrote: >>> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman wrote: >>>> Hi, >>>> >>>> During expression reassociation, statements are conservatively moved >>>> downwards to ensure that dependences are correctly satisfied after >>>> reassocation. This could lead to lengthening of live ranges. This >>>> patch moves statements only to the extent necessary. Bootstraps and no >>>> test regression on x86_64/linux. OK for trunk? >>>> >>>> Thanks, >>>> Easwaran >>>> >>>> 2012-10-18 Easwaran Raman >>>> * tree-ssa-reassoc.c(assign_uids): New function. >>>> (assign_uids_in_relevant_bbs): Likewise. >>>> (ensure_ops_are_available): Likewise. >>>> (rewrite_expr_tree): Do not move statements beyond what is >>>> necessary. Remove call to swap_ops_for_binary_stmt... >>>> (reassociate_bb): ... and move it here. >>>> >>>> >>>> >>>> Index: gcc/tree-ssa-reassoc.c >>>> === >>>> --- gcc/tree-ssa-reassoc.c (revision 192487) >>>> +++ gcc/tree-ssa-reassoc.c (working copy) >>>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea >>>> } >>>> } >>>> >>>> +/* Assign UIDs to statements in basic block BB. */ >>>> + >>>> +static void >>>> +assign_uids (basic_block bb) >>>> +{ >>>> + unsigned uid = 0; >>>> + gimple_stmt_iterator gsi; >>>> + /* First assign uids to phis. */ >>>> + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>> +{ >>>> + gimple stmt = gsi_stmt (gsi); >>>> + gimple_set_uid (stmt, uid++); >>>> +} >>>> + >>>> + /* Then assign uids to stmts. */ >>>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>> +{ >>>> + gimple stmt = gsi_stmt (gsi); >>>> + gimple_set_uid (stmt, uid++); >>>> +} >>>> +} >>>> + >>>> +/* For each operand in OPS, find the basic block that contains the >>>> statement >>>> + which defines the operand. For all such basic blocks, assign UIDs. */ >>>> + >>>> +static void >>>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) >>>> +{ >>>> + operand_entry_t oe; >>>> + int i; >>>> + struct pointer_set_t *seen_bbs = pointer_set_create (); >>>> + >>>> + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) >>>> +{ >>>> + gimple def_stmt; >>>> + basic_block bb; >>>> + if (TREE_CODE (oe->op) != SSA_NAME) >>>> +continue; >>>> + def_stmt = SSA_NAME_DEF_STMT (oe->op); >>>> + bb = gimple_bb (def_stmt); >>>> + if (!pointer_set_contains (seen_bbs, bb)) >>>> +{ >>>> + assign_uids (bb); >>>> + pointer_set_insert (seen_bbs, bb); >>>> +} >>>> +} >>>> + pointer_set_destroy (seen_bbs); >>>> +} >>> >>> Please assign UIDs once using the existing renumber_gimple_stmt_uids (). >>> You seem to call the above multiple times and thus do work bigger than >>> O(number of basic blocks). >> >> The reason I call the above multiple times is that gsi_move_before >> might get called between two calls to the above. For instance, after >> rewrite_expr_tree is called once, the following sequence of calls >> could happen: >> reassociate_bb -> linearize_expr_tree -> linearize_expr -> >> gsi_move_before. So it is not sufficient to call >> renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to >> use renumber_gimple_stmt_uids_in_blocks instead of >> assign_uids_in_relevant_bbs? > > It's sufficient to call it once if you conservatively update UIDs at stmt > move / insert time (assign the same UID as the stmt before/after). I was worried that that would make the code brittle, but looking at the places where a new statement is added or moved, I think it is fine. I have made the change in the new patch. >
Re: Fix bugs introduced by switch-case profile propagation
Hi, On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka wrote: >> Ping. >> >> >> On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman wrote: >> > Hi, >> > This patch fixes bugs introduced by my previous patch to propagate >> > profiles during switch expansion. Bootstrap and profiledbootstrap >> > successful on x86_64. Confirmed that it fixes the crashes reported in >> > PR middle-end/54957. OK for trunk? >> > >> > - Easwaran >> > >> > 2012-10-17 Easwaran Raman >> > >> > PR target/54938 >> > PR middle-end/54957 >> > * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note >> > only if it doesn't already exist. >> > * except.c (sjlj_emit_function_enter): Remove unused variable. >> > * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. > > Seems fine, but under what conditions you get NULL here? Wasn't sure if this is an OK for the patch or if I need to address anything else. Thanks, Easwaran > > Honza >> > (emit_case_dispatch_table): Handle the case where STMT_BB is >> > NULL. >> > (expand_sjlj_dispatch_table): Pass BB containing before_case >> > to emit_case_dispatch_table. >> > >> > Index: gcc/optabs.c >> > === >> > --- gcc/optabs.c (revision 192488) >> > +++ gcc/optabs.c (working copy) >> > @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m >> >&& profile_status != PROFILE_ABSENT >> >&& insn >> >&& JUMP_P (insn) >> > - && any_condjump_p (insn)) >> > -{ >> > - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); >> > - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); >> > -} >> > + && any_condjump_p (insn) >> > + && !find_reg_note (insn, REG_BR_PROB, 0)) >> > +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); >> > } >> > >> > /* Generate code to compare X with Y so that the condition codes are >> > Index: gcc/except.c >> > === >> > --- gcc/except.c (revision 192488) >> > +++ gcc/except.c (working copy) >> > @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label) >> >if (dispatch_label) >> > { >> > #ifdef DONT_USE_BUILTIN_SETJMP >> > - rtx x, last; >> > + rtx x; >> >x = emit_library_call_value (setjmp_libfunc, NULL_RTX, >> > LCT_RETURNS_TWICE, >> > TYPE_MODE (integer_type_node), 1, >> > plus_constant (Pmode, XEXP (fc, 0), >> > Index: gcc/stmt.c >> > === >> > --- gcc/stmt.c (revision 192488) >> > +++ gcc/stmt.c (working copy) >> > @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb) >> >edge e; >> >edge_iterator ei; >> >int prob_sum = 0; >> > + if (!bb) >> > +return 0; >> >FOR_EACH_EDGE(e, ei, bb->succs) >> > prob_sum += e->probability; >> >return prob_sum; >> > @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in >> >rtx fallback_label = label_rtx (case_list->code_label); >> >rtx table_label = gen_label_rtx (); >> >bool has_gaps = false; >> > - edge default_edge = EDGE_SUCC(stmt_bb, 0); >> > - int default_prob = default_edge->probability; >> > + edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL; >> > + int default_prob = default_edge ? default_edge->probability : 0; >> >int base = get_outgoing_edge_probs (stmt_bb); >> >bool try_with_tablejump = false; >> > >> > @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in >> >default_prob = 0; >> > } >> > >> > - default_edge->probability = default_prob; >> > + if (default_edge) >> > +default_edge->probability = default_prob; >> > >> >/* We have altered the probability of the default edge. So the >> > probabilities >> > of all other edges need to be adjusted so that it sums up to >> > @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index, >> > >> >emit_case_dispatch_table (index_expr, index_type, >> > case_list, default_label, >> > - minval, maxval, range, NULL); >> > + minval, maxval, range, >> > +BLOCK_FOR_INSN (before_case)); >> >emit_label (default_label); >> >free_alloc_pool (case_node_pool); >> > }
Re: Fix bugs introduced by switch-case profile propagation
On Fri, Oct 26, 2012 at 8:05 AM, Jan Hubicka wrote: >> Hi, >> >> On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka wrote: >> >> Ping. >> >> >> >> >> >> On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman wrote: >> >> > Hi, >> >> > This patch fixes bugs introduced by my previous patch to propagate >> >> > profiles during switch expansion. Bootstrap and profiledbootstrap >> >> > successful on x86_64. Confirmed that it fixes the crashes reported in >> >> > PR middle-end/54957. OK for trunk? >> >> > >> >> > - Easwaran >> >> > >> >> > 2012-10-17 Easwaran Raman >> >> > >> >> > PR target/54938 >> >> > PR middle-end/54957 >> >> > * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note >> >> > only if it doesn't already exist. >> >> > * except.c (sjlj_emit_function_enter): Remove unused variable. >> >> > * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. >> > >> > Seems fine, but under what conditions you get NULL here? >> Wasn't sure if this is an OK for the patch or if I need to address >> anything else. > > Actually I think you should make the except.c to setcurrent_bb when expanding > the switch instead. In the current code, in sjlj_emit_dispatch_table (except.c), a sequence of instructions including the dispatch table jump are first generated and emitted to a BB before the first of the jump table targets. I think you are asking me to first emit the instructions before the indirect jump into a new bb so that BLOCK_FOR_INSN (before_case) in stmt.c is non NULL. This would need some refactoring inside except.c, but more importantly this BB won't have the correct control flow right? So, while I can avoid the check for BB being NULL in get_outgoing_edge_probs, I still need to guard for default_edge being NULL and default_prob will still be 0. Would it be ok if I remove the check for NULL inside get_outgoing_edge_probs and instead check it when I initialize BASE? Thanks, Easwaran > OK with this change. > > Honza
Don't drop attributes on template member functions
This attempts to address one issue reported in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33911 . Attributes at the end of a declaration of template member functions are discarded in cp_parser_init_declarator. The suggested workaround is to place the attributes into the declaration-specifiers. The code in cp_parser_init_declarator actually parses the attributes at the end, but only passes the prefix_attributes (from decl-specifier-seq) to grokfield(). Appending the parsed atrtributes to prefix_attributes seems to work, but I don't know if this is sufficient in all cases. Bootstraps and all tests pass with the patch on x86_64/linux. Is this ok for trunk? 2013-10-02 Easwaran Raman PR c++/33911 * parser.c (cp_parser_init_declarator): Do not drop attributes of template member functions. 2013-10-02 Easwaran Raman PR c++/33911 * g++.dg/ext/attribute47.C: New. parser_attribute.patch Description: Binary data
Add a param to decide stack slot sharing at -O0
In cfgexpand.c, variables in non-overlapping lexical scopes are assigned same stack locations at -O1 and above. At -O0, this is attempted only if the size of the stack objects is above a threshold (32). The rationale is at -O0, more variables are going to be in the stack and the O(n^2) stack slot sharing algorithm will increase the compilation time. This patch replaces the constant with a param which is set to 32 by default. We ran into a case where the presence of always_inline attribute triggered Wframe-larger-than warnings at -O0 but not at -O2 since the different inlined copies share the stack. We are ok with a slight increase in compilation time to get smaller stack frames even at -O0 and this patch would allow us do that easily. Bootstraps on x86_64/linux. Is this ok for trunk? Thanks, Easwaran 2013-10-08 Easwaran Raman * params.def (PARAM_MIN_SIZE_FOR_STACK_SHARING): New param... * cfgexpand.c (defer_stack_allocation): ...use here cfgexpand.patch Description: Binary data
Re: Add a param to decide stack slot sharing at -O0
On Wed, Oct 9, 2013 at 4:11 AM, Richard Biener wrote: > On Tue, Oct 8, 2013 at 11:04 PM, Easwaran Raman wrote: >> In cfgexpand.c, variables in non-overlapping lexical scopes are >> assigned same stack locations at -O1 and above. At -O0, this is >> attempted only if the size of the stack objects is above a threshold >> (32). The rationale is at -O0, more variables are going to be in the >> stack and the O(n^2) stack slot sharing algorithm will increase the >> compilation time. This patch replaces the constant with a param which >> is set to 32 by default. We ran into a case where the presence of >> always_inline attribute triggered Wframe-larger-than warnings at -O0 >> but not at -O2 since the different inlined copies share the stack. We >> are ok with a slight increase in compilation time to get smaller stack >> frames even at -O0 and this patch would allow us do that easily. >> >> Bootstraps on x86_64/linux. Is this ok for trunk? > > Ok with > > +DEFPARAM (PARAM_MIN_SIZE_FOR_STACK_SHARING, > + "min-size-for-stack-sharing", > + "Attempt to share stack slots among variables in different > lexical blocks " > + "at O0 only if their sizes exceed this value", > + 32, 0, 0) > > changed to > >"The minimum size of variables taking part in stack slot sharing " >"when not optimizing" >32, 0, 0) > > And with adding documentation for that param in doc/invoke.texi. > > Btw, I'm not sure the sharing algorithm is still quadratic - can you > investigate on that? The partition_stack_vars is still quadratic in worst case (all live ranges overlap), but the expensive part is likely to be the add_scope_conflicts function. - Easwaran > Thanks, > Richard. > >> Thanks, >> Easwaran >> >> >> 2013-10-08 Easwaran Raman >> >> * params.def (PARAM_MIN_SIZE_FOR_STACK_SHARING): New param... >> * cfgexpand.c (defer_stack_allocation): ...use here
[4.9] PR 62146
This patch deletes REG_EQUAL note when a src register is replaced by a constant in an assignment. This is to prevent spurious equivalences between the constant and the expression in the REG_EQUAL note. In the bug reported in PR 62146, an assignment in one branch (which is actually dead) of an IF statement has a REG_EQUAL note equating a register with an expression. Conditional copy propagation replaces the register with 0. The instruction is hoisted above the branch subsequently and then the value 0 is equated with the expression in the REG_EQUAL. Is this ok for 4.9 branch if all tests pass? This patch looks applicable to trunk as well, but I don't have a test case to reproduce the issue in trunk. ChangeLog: 2014-08-25 Easwaran Raman PR rtl-optimization/62146 * cprop.c (try_replace_reg): Remove REG_EQUAL note when a constant is propagated into the src of an assignment. testsuite/ChangeLog 2014-08-25 Easwaran Raman PR rtl-optimization/62146 * testsuite/g++.dg/opt/pr62146.C: New. Index: gcc/testsuite/g++.dg/opt/pr62146.C === --- gcc/testsuite/g++.dg/opt/pr62146.C (revision 0) +++ gcc/testsuite/g++.dg/opt/pr62146.C (revision 0) @@ -0,0 +1,51 @@ +/* PR rtl-optimization/62146 */ +/* { dg-do compile } */ +/* { dg-options "-O2 " } */ +class F +{ +public: +virtual ~ F (); +}; +template < class CL > class G:public F +{ +int *member_; +public: +G ( int *b): member_ (0) +{ +} +}; + +class D +{ +public: +template < class CL > void RegisterNonTagCallback (int, +void (CL:: + *p3) ()) +{ +InternalRegisterNonTag (p3 ? new G < CL > ( 0) : 0); +} void InternalRegisterNonTag (F *); +}; + +void fn1 (); +class C1 +{ +void foo(); +class TokenType +{ +public: +void AddToken () +{ +} +}; +C1::TokenType bar_t; +}; +D a; +void C1::foo() +{ +if (&bar_t) +fn1 (); +for (int i = 0; i < sizeof 0; ++i) +a.RegisterNonTagCallback (0, &TokenType::AddToken); +} + +/* { dg-final { scan-assembler-not "mov.*_ZN2C19TokenType8AddTokenEv, .\\\(" } } */ Index: gcc/cprop.c === --- gcc/cprop.c (revision 214472) +++ gcc/cprop.c (working copy) @@ -790,8 +790,11 @@ try_replace_reg (rtx from, rtx to, rtx insn) /* REG_EQUAL may get simplified into register. We don't allow that. Remove that note. This code ought not to happen, because previous code ought to synthesize - reg-reg move, but be on the safe side. */ - if (note && REG_NOTE_KIND (note) == REG_EQUAL && REG_P (XEXP (note, 0))) + reg-reg move, but be on the safe side. The REG_EQUAL note is + also removed when the source is a constant. */ + if (note && REG_NOTE_KIND (note) == REG_EQUAL + && (REG_P (XEXP (note, 0)) + || (set && CONSTANT_P (SET_SRC (set) remove_note (insn, note); return success;
Re: [4.9] PR 62146
On Mon, Aug 25, 2014 at 3:42 PM, Easwaran Raman wrote: > This patch deletes REG_EQUAL note when a src register is replaced by a > constant in an assignment. This is to prevent spurious equivalences > between the constant and the expression in the REG_EQUAL note. In the > bug reported in PR 62146, an assignment in one branch (which is > actually dead) of an IF statement has a REG_EQUAL note equating a > register with an expression. Conditional copy propagation replaces the > register with 0. The instruction is hoisted above the branch > subsequently and then the value 0 is equated with the expression in > the REG_EQUAL. Is this ok for 4.9 branch if all tests pass? > > This patch looks applicable to trunk as well, but I don't have a test > case to reproduce the issue in trunk. > > > ChangeLog: > > 2014-08-25 Easwaran Raman > > PR rtl-optimization/62146 > * cprop.c (try_replace_reg): Remove REG_EQUAL note when a constant is > propagated into the src of an assignment. > > testsuite/ChangeLog > > 2014-08-25 Easwaran Raman > > PR rtl-optimization/62146 > * testsuite/g++.dg/opt/pr62146.C: New.
Re: [4.9] PR 62146
It turns out that the REG_EQUAL note is removed on a hoisted instruction (relevant code is in dead_or_predicable in ifcvt.c) if the source of the move instruction is not a function invariant. In this case, the source is a function invariant (constant) and so that doesn't kick in. I don't understand why this exemption for function invariant is there and the original thread in https://gcc.gnu.org/ml/gcc/2005-05/msg01710.html doesn't explain either. Should I just remove the REG_EQUAL notes of all hoisted instructions or are there cases where it is safe to leave the note? Thanks, Easwaran On Fri, Aug 29, 2014 at 1:06 PM, Jeff Law wrote: > On 08/25/14 16:42, Easwaran Raman wrote: >> >> This patch deletes REG_EQUAL note when a src register is replaced by a >> constant in an assignment. This is to prevent spurious equivalences >> between the constant and the expression in the REG_EQUAL note. In the >> bug reported in PR 62146, an assignment in one branch (which is >> actually dead) of an IF statement has a REG_EQUAL note equating a >> register with an expression. Conditional copy propagation replaces the >> register with 0. The instruction is hoisted above the branch >> subsequently and then the value 0 is equated with the expression in >> the REG_EQUAL. Is this ok for 4.9 branch if all tests pass? >> >> This patch looks applicable to trunk as well, but I don't have a test >> case to reproduce the issue in trunk. > > Something doesn't feel right with this patch. It seems to me the real > problem is when when hoist the insn with the note. If the equivalence > implied by the note is no longer valid at the insn's new location, then the > note needs to be removed. > > Now determining if the note is no longer valid at the new location may prove > difficult ;-) You'd probably have to know why the note was created, how it > was changed, etc. So I suspect the right thing to do is just remove > REG_EQUAL notes on any insns we hoist in this manner. > > Jeff
Re: [4.9] PR 62146
I've attached the revised patch. Bootstrapped and no test regressions on x86_64/linux with 4.9 branch. Ok for 4.9 branch? While the bug doesn't show up in trunk, seems obvious that this should go to trunk as well. Is it ok for trunk if tests pass? Btw, is g++.dg/opt the right directory for the test to go? - ChangeLog 2014-09-04 Easwaran Raman PR rtl-optimization/62146 * ifcvt.c (dead_or_predicable): Make removal of REG_EQUAL note of hoisted instruction unconditional. 2014-09-04 Easwaran Raman PR rtl-optimization/62146 * testsuite/g++.dg/opt/pr62146.C: New. On Wed, Sep 3, 2014 at 11:36 AM, Jeff Law wrote: > On 09/02/14 12:52, Easwaran Raman wrote: >> >> It turns out that the REG_EQUAL note is removed on a hoisted >> instruction (relevant code is in dead_or_predicable in ifcvt.c) if the >> source of the move instruction is not a function invariant. In this >> case, the source is a function invariant (constant) and so that >> doesn't kick in. I don't understand why this exemption for function >> invariant is there and the original thread in >> https://gcc.gnu.org/ml/gcc/2005-05/msg01710.html doesn't explain >> either. Should I just remove the REG_EQUAL notes of all hoisted >> instructions or are there cases where it is safe to leave the note? > > I suspect it was left in in an attempt to keep as many REG_EQUAL notes as > possible as the notes can help later optimizations. But even those > equivalences are not necessarily safe. > > I'm pretty sure the right thing to do here is just drop the note regardless > of whether or not its an invariant. > > jeff > Index: testsuite/g++.dg/opt/pr62146.C === --- testsuite/g++.dg/opt/pr62146.C (revision 0) +++ testsuite/g++.dg/opt/pr62146.C (revision 0) @@ -0,0 +1,51 @@ +/* PR rtl-optimization/62146 */ +/* { dg-do compile } */ +/* { dg-options "-O2 " } */ +class F +{ +public: +virtual ~ F (); +}; +template < class CL > class G:public F +{ +int *member_; +public: +G ( int *b): member_ (0) +{ +} +}; + +class D +{ +public: +template < class CL > void RegisterNonTagCallback (int, +void (CL:: + *p3) ()) +{ +InternalRegisterNonTag (p3 ? new G < CL > ( 0) : 0); +} void InternalRegisterNonTag (F *); +}; + +void fn1 (); +class C1 +{ +void foo(); +class TokenType +{ +public: +void AddToken () +{ +} +}; +C1::TokenType bar_t; +}; +D a; +void C1::foo() +{ +if (&bar_t) +fn1 (); +for (int i = 0; i < sizeof 0; ++i) +a.RegisterNonTagCallback (0, &TokenType::AddToken); +} + +/* { dg-final { scan-assembler-not "mov.*_ZN2C19TokenType8AddTokenEv, .\\\(" } } */ Index: ifcvt.c === --- ifcvt.c (revision 214472) +++ ifcvt.c (working copy) @@ -4387,17 +4387,14 @@ dead_or_predicable (basic_block test_bb, basic_blo insn = head; do { - rtx note, set; + rtx note; if (! INSN_P (insn)) continue; note = find_reg_note (insn, REG_EQUAL, NULL_RTX); if (! note) continue; - set = single_set (insn); - if (!set || !function_invariant_p (SET_SRC (set)) - || !function_invariant_p (XEXP (note, 0))) - remove_note (insn, note); + remove_note (insn, note); } while (insn != end && (insn = NEXT_INSN (insn))); /* PR46315: when moving insns above a conditional branch, the REG_EQUAL
Fix for PR c++/59031
Before r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL is set and the call is direct. The changes at r193504 (to fix PR c++/11750) caused a regression to this behavior. This patch attempts to fix that. Bootstraps and no test regressions on x86_64/linux. Is this a correct fix and is this ok for trunk? Thanks, Easwaran 2013-11-07 Easwaran Raman PR c++/59031 * call.c (build_new_method_call_1): Comnpare function context with BASELINK_BINFO type rather than instance type before marking the call with LOOKUP_NONVIRTUAL. testsuite/ChangeLog: 2013-11-07 Easwaran Raman PR c++/59031 * g++.dg/inherit/virtual11.C: New test. Index: testsuite/g++.dg/inherit/virtual11.C === --- testsuite/g++.dg/inherit/virtual11.C (revision 0) +++ testsuite/g++.dg/inherit/virtual11.C (revision 0) @@ -0,0 +1,17 @@ +// PR c++/59031 +// { dg-do compile } +// { dg-options "-fdump-tree-gimple " } +class B { + public: + virtual int add (int a, int b) {return a+ b;} +}; + +class D : public B { +}; + +int foo (int a, int b) { + D d; + return d.add(a, b); +} +// { dg-final { scan-tree-dump-not "OBJ_TYPE_REF" "gimple" } } +// { dg-final { cleanup-tree-dump "gimple" } } Index: cp/call.c === --- cp/call.c (revision 204466) +++ cp/call.c (working copy) @@ -7511,7 +7511,7 @@ build_new_method_call_1 (tree instance, tree fns, struct z_candidate *candidates = 0, *cand; tree explicit_targs = NULL_TREE; tree basetype = NULL_TREE; - tree access_binfo; + tree access_binfo, binfo; tree optype; tree first_mem_arg = NULL_TREE; tree name; @@ -7550,6 +7550,7 @@ build_new_method_call_1 (tree instance, tree fns, if (!conversion_path) conversion_path = BASELINK_BINFO (fns); access_binfo = BASELINK_ACCESS_BINFO (fns); + binfo = BASELINK_BINFO (fns); optype = BASELINK_OPTYPE (fns); fns = BASELINK_FUNCTIONS (fns); if (TREE_CODE (fns) == TEMPLATE_ID_EXPR) @@ -7802,7 +7803,7 @@ build_new_method_call_1 (tree instance, tree fns, do not want to perform a non-virtual call. */ if (DECL_VINDEX (fn) && ! (flags & LOOKUP_NONVIRTUAL) && same_type_ignoring_top_level_qualifiers_p - (DECL_CONTEXT (fn), TREE_TYPE (instance)) + (DECL_CONTEXT (fn), TREE_TYPE (binfo)) && resolves_to_fixed_type_p (instance, 0)) flags |= LOOKUP_NONVIRTUAL; if (explicit_targs)
Re: Fix for PR c++/59031
Ping. On Thu, Nov 7, 2013 at 11:14 AM, Easwaran Raman wrote: > Before r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL > is set and the call is direct. The changes at r193504 (to fix PR > c++/11750) caused a regression to this behavior. This patch attempts > to fix that. Bootstraps and no test regressions on x86_64/linux. Is > this a correct fix and is this ok for trunk? > > Thanks, > Easwaran > > 2013-11-07 Easwaran Raman > > PR c++/59031 > * call.c (build_new_method_call_1): Comnpare function context > with BASELINK_BINFO type rather than instance type before > marking the call with LOOKUP_NONVIRTUAL. > > testsuite/ChangeLog: > > 2013-11-07 Easwaran Raman > > PR c++/59031 > * g++.dg/inherit/virtual11.C: New test.
Re: Fix for PR c++/59031
Ping. On Mon, Nov 11, 2013 at 9:46 AM, Easwaran Raman wrote: > Ping. > > On Thu, Nov 7, 2013 at 11:14 AM, Easwaran Raman wrote: >> Before r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL >> is set and the call is direct. The changes at r193504 (to fix PR >> c++/11750) caused a regression to this behavior. This patch attempts >> to fix that. Bootstraps and no test regressions on x86_64/linux. Is >> this a correct fix and is this ok for trunk? >> >> Thanks, >> Easwaran >> >> 2013-11-07 Easwaran Raman >> >> PR c++/59031 >> * call.c (build_new_method_call_1): Comnpare function context >> with BASELINK_BINFO type rather than instance type before >> marking the call with LOOKUP_NONVIRTUAL. >> >> testsuite/ChangeLog: >> >> 2013-11-07 Easwaran Raman >> >> PR c++/59031 >> * g++.dg/inherit/virtual11.C: New test.
Re: Minimize downward code motion during reassociation
On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener wrote: > On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman wrote: >> On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener >> wrote: >>> On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman wrote: >>>> On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener >>>> wrote: >>>>> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> During expression reassociation, statements are conservatively moved >>>>>> downwards to ensure that dependences are correctly satisfied after >>>>>> reassocation. This could lead to lengthening of live ranges. This >>>>>> patch moves statements only to the extent necessary. Bootstraps and no >>>>>> test regression on x86_64/linux. OK for trunk? >>>>>> >>>>>> Thanks, >>>>>> Easwaran >>>>>> >>>>>> 2012-10-18 Easwaran Raman >>>>>> * tree-ssa-reassoc.c(assign_uids): New function. >>>>>> (assign_uids_in_relevant_bbs): Likewise. >>>>>> (ensure_ops_are_available): Likewise. >>>>>> (rewrite_expr_tree): Do not move statements beyond what is >>>>>> necessary. Remove call to swap_ops_for_binary_stmt... >>>>>> (reassociate_bb): ... and move it here. >>>>>> >>>>>> >>>>>> >>>>>> Index: gcc/tree-ssa-reassoc.c >>>>>> === >>>>>> --- gcc/tree-ssa-reassoc.c (revision 192487) >>>>>> +++ gcc/tree-ssa-reassoc.c (working copy) >>>>>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, >>>>>> hea >>>>>> } >>>>>> } >>>>>> >>>>>> +/* Assign UIDs to statements in basic block BB. */ >>>>>> + >>>>>> +static void >>>>>> +assign_uids (basic_block bb) >>>>>> +{ >>>>>> + unsigned uid = 0; >>>>>> + gimple_stmt_iterator gsi; >>>>>> + /* First assign uids to phis. */ >>>>>> + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>> +{ >>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>> + gimple_set_uid (stmt, uid++); >>>>>> +} >>>>>> + >>>>>> + /* Then assign uids to stmts. */ >>>>>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>> +{ >>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>> + gimple_set_uid (stmt, uid++); >>>>>> +} >>>>>> +} >>>>>> + >>>>>> +/* For each operand in OPS, find the basic block that contains the >>>>>> statement >>>>>> + which defines the operand. For all such basic blocks, assign UIDs. >>>>>> */ >>>>>> + >>>>>> +static void >>>>>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) >>>>>> +{ >>>>>> + operand_entry_t oe; >>>>>> + int i; >>>>>> + struct pointer_set_t *seen_bbs = pointer_set_create (); >>>>>> + >>>>>> + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) >>>>>> +{ >>>>>> + gimple def_stmt; >>>>>> + basic_block bb; >>>>>> + if (TREE_CODE (oe->op) != SSA_NAME) >>>>>> +continue; >>>>>> + def_stmt = SSA_NAME_DEF_STMT (oe->op); >>>>>> + bb = gimple_bb (def_stmt); >>>>>> + if (!pointer_set_contains (seen_bbs, bb)) >>>>>> +{ >>>>>> + assign_uids (bb); >>>>>> + pointer_set_insert (seen_bbs, bb); >>>>>> +} >>>>>> +} >>>>>> + pointer_set_destroy (seen_bbs); >>>>>> +} >>>>> >>>>> Please assign UIDs once using the existing renumber_gimple_stmt_uids (). >>>>> You seem to call the above multiple times and thus do work bigger than >>>>> O(number of basic blocks). >>>> >>&
Re: Minimize downward code motion during reassociation
Ping. - Easwaran On Wed, Oct 31, 2012 at 12:16 PM, Easwaran Raman wrote: > On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener > wrote: >> On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman wrote: >>> On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener >>> wrote: >>>> On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman wrote: >>>>> On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener >>>>> wrote: >>>>>> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> During expression reassociation, statements are conservatively moved >>>>>>> downwards to ensure that dependences are correctly satisfied after >>>>>>> reassocation. This could lead to lengthening of live ranges. This >>>>>>> patch moves statements only to the extent necessary. Bootstraps and no >>>>>>> test regression on x86_64/linux. OK for trunk? >>>>>>> >>>>>>> Thanks, >>>>>>> Easwaran >>>>>>> >>>>>>> 2012-10-18 Easwaran Raman >>>>>>> * tree-ssa-reassoc.c(assign_uids): New function. >>>>>>> (assign_uids_in_relevant_bbs): Likewise. >>>>>>> (ensure_ops_are_available): Likewise. >>>>>>> (rewrite_expr_tree): Do not move statements beyond what is >>>>>>> necessary. Remove call to swap_ops_for_binary_stmt... >>>>>>> (reassociate_bb): ... and move it here. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Index: gcc/tree-ssa-reassoc.c >>>>>>> === >>>>>>> --- gcc/tree-ssa-reassoc.c (revision 192487) >>>>>>> +++ gcc/tree-ssa-reassoc.c (working copy) >>>>>>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, >>>>>>> hea >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> +/* Assign UIDs to statements in basic block BB. */ >>>>>>> + >>>>>>> +static void >>>>>>> +assign_uids (basic_block bb) >>>>>>> +{ >>>>>>> + unsigned uid = 0; >>>>>>> + gimple_stmt_iterator gsi; >>>>>>> + /* First assign uids to phis. */ >>>>>>> + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>>> +{ >>>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>>> + gimple_set_uid (stmt, uid++); >>>>>>> +} >>>>>>> + >>>>>>> + /* Then assign uids to stmts. */ >>>>>>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>>> +{ >>>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>>> + gimple_set_uid (stmt, uid++); >>>>>>> +} >>>>>>> +} >>>>>>> + >>>>>>> +/* For each operand in OPS, find the basic block that contains the >>>>>>> statement >>>>>>> + which defines the operand. For all such basic blocks, assign UIDs. >>>>>>> */ >>>>>>> + >>>>>>> +static void >>>>>>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) >>>>>>> +{ >>>>>>> + operand_entry_t oe; >>>>>>> + int i; >>>>>>> + struct pointer_set_t *seen_bbs = pointer_set_create (); >>>>>>> + >>>>>>> + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) >>>>>>> +{ >>>>>>> + gimple def_stmt; >>>>>>> + basic_block bb; >>>>>>> + if (TREE_CODE (oe->op) != SSA_NAME) >>>>>>> +continue; >>>>>>> + def_stmt = SSA_NAME_DEF_STMT (oe->op); >>>>>>> + bb = gimple_bb (def_stmt); >>>>>>> + if (!pointer_set_contains (seen_bbs, bb)) >>>>>>> +{ >>>>>>> + assign_uids (bb); >>>>>>> + pointer_set_insert (seen_bbs, bb); >>>>&
Re: Minimize downward code motion during reassociation
I am unable to figure out the right way to handle the debug statements. What I tried was to find debug statements that use the SSA name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE) and then moved them as well at the right place. Thus, if I have to move t1 = a + b down (after the definition of 'd'), I also moved all debug statements that use t1 after the new position of t1. That still caused use-before-def problems in ssa_verify. I noticed that the debug statements got modified behind the scenes causing these issues. Any hints on what is the right way to handle the debug statements would be very helpful. Thanks, Easwaran On Sun, Nov 4, 2012 at 9:10 AM, Richard Biener wrote: > On Wed, Oct 31, 2012 at 8:16 PM, Easwaran Raman wrote: >> On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener >> wrote: >>> On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman wrote: >>>> On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener >>>> wrote: >>>>> On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman wrote: >>>>>> On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener >>>>>> wrote: >>>>>>> On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> During expression reassociation, statements are conservatively moved >>>>>>>> downwards to ensure that dependences are correctly satisfied after >>>>>>>> reassocation. This could lead to lengthening of live ranges. This >>>>>>>> patch moves statements only to the extent necessary. Bootstraps and no >>>>>>>> test regression on x86_64/linux. OK for trunk? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Easwaran >>>>>>>> >>>>>>>> 2012-10-18 Easwaran Raman >>>>>>>> * tree-ssa-reassoc.c(assign_uids): New function. >>>>>>>> (assign_uids_in_relevant_bbs): Likewise. >>>>>>>> (ensure_ops_are_available): Likewise. >>>>>>>> (rewrite_expr_tree): Do not move statements beyond what is >>>>>>>> necessary. Remove call to swap_ops_for_binary_stmt... >>>>>>>> (reassociate_bb): ... and move it here. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Index: gcc/tree-ssa-reassoc.c >>>>>>>> === >>>>>>>> --- gcc/tree-ssa-reassoc.c (revision 192487) >>>>>>>> +++ gcc/tree-ssa-reassoc.c (working copy) >>>>>>>> @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, >>>>>>>> hea >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> +/* Assign UIDs to statements in basic block BB. */ >>>>>>>> + >>>>>>>> +static void >>>>>>>> +assign_uids (basic_block bb) >>>>>>>> +{ >>>>>>>> + unsigned uid = 0; >>>>>>>> + gimple_stmt_iterator gsi; >>>>>>>> + /* First assign uids to phis. */ >>>>>>>> + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>>>> +{ >>>>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>>>> + gimple_set_uid (stmt, uid++); >>>>>>>> +} >>>>>>>> + >>>>>>>> + /* Then assign uids to stmts. */ >>>>>>>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>>>> +{ >>>>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>>>> + gimple_set_uid (stmt, uid++); >>>>>>>> +} >>>>>>>> +} >>>>>>>> + >>>>>>>> +/* For each operand in OPS, find the basic block that contains the >>>>>>>> statement >>>>>>>> + which defines the operand. For all such basic blocks, assign UIDs. >>>>>>>> */ >>>>>>>> + >>>>>>>> +static void >>>>>>>> +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) >>>>>>>&
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
I'd like to get a small patch to tree reassociation ( http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01761.html ) in. Thanks, Easwaran On Mon, Oct 29, 2012 at 10:56 AM, Jakub Jelinek wrote: > Status > == > > I'd like to close the stage 1 phase of GCC 4.8 development > on Monday, November 5th. If you have still patches for new features you'd > like to see in GCC 4.8, please post them for review soon. Patches > posted before the freeze, but reviewed shortly after the freeze, may > still go in, further changes should be just bugfixes and documentation > fixes. > > > Quality Data > > > Priority # Change from Last Report > --- --- > P1 23 + 23 > P2 77 + 8 > P3 85 + 84 > --- --- > Total 185 +115 > > > Previous Report > === > > http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html > > The next report will be sent by me again, announcing end of stage 1.
Re: Minimize downward code motion during reassociation
It seems I need to reset the debug uses of a statement before moving the statement itself. The attached patch starts from the leaf to root of the tree to be reassociated and places them at the point where their dependences will be met after reassociation. This bootstraps and I am running the tests. Ok if there are no test failures? Thanks, Easwaran 2012-12-07 Easwaran Raman * tree-ssa-reassoc.c(find_insert_point): New function. (insert_stmt_after): Likewise. (get_def_stmt): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. (build_and_add_sum): Assign UIDs for new statements. (linearize_expr): Likewise. (do_reassoc): Renumber gimple statement UIDs. On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener wrote: > On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman wrote: >> I am unable to figure out the right way to handle the debug >> statements. What I tried was to find debug statements that use the SSA >> name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE) >> and then moved them as well at the right place. Thus, if I have to >> move t1 = a + b down (after the definition of 'd'), I also moved all >> debug statements that use t1 after the new position of t1. That still >> caused use-before-def problems in ssa_verify. I noticed that the debug >> statements got modified behind the scenes causing these issues. Any >> hints on what is the right way to handle the debug statements would be >> very helpful. > > I think you cannot (and should not) move debug statements. Instead you > have to invalidate them. Otherwise you'll introduce confusion as debug > info cannot handle overlapping live ranges. > > But maybe Alex can clarify. > > Richard. reassoc_revised.diff Description: Binary data
Print column information in dump_loc
This patch dumps the column number as part of dump_loc making the output similar to inform(). This allows these messages to be pattern matched by dg_message. Bootstraps with this change. Ok for trunk? - Easwaran - 2013-05-14 Easwaran Raman * dumpfile.c (dump_loc): Print column number. Index: gcc/dumpfile.c === --- gcc/dumpfile.c (revision 198852) +++ gcc/dumpfile.c (working copy) @@ -261,12 +261,13 @@ dump_loc (int dump_kind, FILE *dfile, source_locat if (dump_kind) { if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION) -fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc), - LOCATION_LINE (loc)); +fprintf (dfile, "\n%s:%d:%d: note: ", LOCATION_FILE (loc), + LOCATION_LINE (loc), LOCATION_COLUMN (loc)); else if (current_function_decl) -fprintf (dfile, "\n%s:%d: note: ", +fprintf (dfile, "\n%s:%d:%d: note: ", DECL_SOURCE_FILE (current_function_decl), - DECL_SOURCE_LINE (current_function_decl)); + DECL_SOURCE_LINE (current_function_decl), + DECL_SOURCE_COLUMN (current_function_decl)); } }
Re: Minimize downward code motion during reassociation
On Thu, Apr 25, 2013 at 4:42 AM, Richard Biener wrote: > On Fri, Dec 7, 2012 at 9:01 PM, Easwaran Raman wrote: >> It seems I need to reset the debug uses of a statement before moving >> the statement itself. The attached patch starts from the leaf to root >> of the tree to be reassociated and places them at the point where >> their dependences will be met after reassociation. This bootstraps and >> I am running the tests. Ok if there are no test failures? > > + if ((dep_bb != insert_bb > + && !dominated_by_p (CDI_DOMINATORS, insert_bb, dep_bb)) > + || (dep_bb == insert_bb > + && gimple_uid (insert_stmt) < gimple_uid (dep_stmt))) > +{ > + insert_stmt = dep_stmt; > +} > > superfluous {} Removed. > > + /* If there are any debug uses of LHS, reset them. */ > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) > +{ > + if (is_gimple_debug (use_stmt)) > +{ > + gimple_debug_bind_reset_value (use_stmt); > > that's only needed for debug stmts that are not dominated by insert_point, no? You are right. I have added the check. > > + /* Statements marked for throw can not be in the middle of a basic block. > So > + we can not insert a statement (not marked for throw) immediately > after. */ > + else if (stmt_can_throw_internal (insert_point)) > +{ > > use stmt_ends_bb_p (insert_point) instead > > + edge succ_edge = find_fallthru_edge (insert_bb->succs); > + insert_bb = succ_edge->dest; > + /* Insert STMT at the beginning of the successor basic block. */ > + gsi_insert = gsi_after_labels (insert_bb); > + gsi_move_before (&gsistmt, &gsi_insert); > > are you sure this is a proper insert location? How would you even arrive in > this situation given find_insert_point dominance checks? Let's say the last statement in a BB which can throw feeds into a statement that is to be reassociated. Then the insert point would be this statement. Instead, you could always insert at the beginning of its (lone) successor. This is independent of the the find_insert_point checks. > > Otherwise the patch looks ok - can you add one or two testcases please? I have added a test case and submitted at r199048. Thanks, Easwaran > > Thanks, > Richard. > >> Thanks, >> Easwaran >> >> 2012-12-07 Easwaran Raman >> * tree-ssa-reassoc.c(find_insert_point): New function. >> (insert_stmt_after): Likewise. >> (get_def_stmt): Likewise. >> (ensure_ops_are_available): Likewise. >> (rewrite_expr_tree): Do not move statements beyond what is >> necessary. Remove call to swap_ops_for_binary_stmt... >> (reassociate_bb): ... and move it here. >> (build_and_add_sum): Assign UIDs for new statements. >> (linearize_expr): Likewise. >> (do_reassoc): Renumber gimple statement UIDs. >> >> >> >> On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener >> wrote: >>> On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman wrote: >>>> I am unable to figure out the right way to handle the debug >>>> statements. What I tried was to find debug statements that use the SSA >>>> name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE) >>>> and then moved them as well at the right place. Thus, if I have to >>>> move t1 = a + b down (after the definition of 'd'), I also moved all >>>> debug statements that use t1 after the new position of t1. That still >>>> caused use-before-def problems in ssa_verify. I noticed that the debug >>>> statements got modified behind the scenes causing these issues. Any >>>> hints on what is the right way to handle the debug statements would be >>>> very helpful. >>> >>> I think you cannot (and should not) move debug statements. Instead you >>> have to invalidate them. Otherwise you'll introduce confusion as debug >>> info cannot handle overlapping live ranges. >>> >>> But maybe Alex can clarify. >>> >>> Richard.
Fix PR tree-optimization/57322
The UID of a newly generated statement in build_and_add_sum is set to that of an adjacent statement in the BB. This is a problem in one case where the BB is empty. This fix sets the UID to be 1 if the BB is empty. Bootstraps and no test regressions on x86_64 . OK for trunk? Thanks, Easwaran --- 2013-05-19 Easwaran Raman PR tree-optimization/57322 * (build_and_add_sum): If a BB is empty, set the UID of the statement added to the BB to be 1. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 199048) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -1165,8 +1165,12 @@ build_and_add_sum (tree type, tree op1, tree op2, if ((!op1def || gimple_nop_p (op1def)) && (!op2def || gimple_nop_p (op2def))) { + gimple first_stmt; + unsigned uid; gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR)); - gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi))); + first_stmt = gsi_stmt (gsi); + uid = first_stmt ? gimple_uid (first_stmt) : 1; + gimple_set_uid (sum, uid); gsi_insert_before (&gsi, sum, GSI_NEW_STMT); } else if ((!op1def || gimple_nop_p (op1def))
Re: Fix PR tree-optimization/57322
If you are suggesting using inc_gimple_stmt_max_uid(cfun) for all statements, that won't work because we want to use the UIDs to determine dominance within a BB. If your suggestion is to use that instead of 1 when BB == NULL, that would work (even though setting it to 1 is sufficient.) Thanks, Easwaran On Mon, May 20, 2013 at 4:43 AM, Steven Bosscher wrote: > On Mon, May 20, 2013 at 1:41 AM, Easwaran Raman wrote: >> PR tree-optimization/57322 >> * (build_and_add_sum): If a BB is empty, set the UID of the statement >> added to the BB to be 1. > > Maybe "gimple_set_uid (stmt, inc_gimple_stmt_max_uid (cfun));"? That > would keep the UIDs unique except for the copied UIDs. > > Ciao! > Steven
PR tree-optimization/57337
This addresses the case where UID alone is not sufficient to figure out which statement appears earlier in a BB. Bootstraps and no test regressions in x86_64 on linux. Ok for trunk? Thanks, Easwaran 2013-05-23 Easwaran Raman PR tree-optimization/57337 * tree-ssa-reassoc.c (appears_later_in_bb): New function. (find_insert_point): Correctly identify the insertion point when two statements with the same UID is compared. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 199211) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b) } +/* Among STMT1 and STMT2, return the statement that appears later. Both + statements are in same BB and have the same UID. */ + +static gimple +appears_later_in_bb (gimple stmt1, gimple stmt2) +{ + unsigned uid = gimple_uid (stmt1); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt1); + gsi_next (&gsi); + if (gsi_end_p (gsi)) +return stmt1; + for (; !gsi_end_p (gsi); gsi_next (&gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + /* If STMT has a different UID than STMT1 and we haven't seen + STMT2 during traversal, we know STMT1 appears later. */ + if (gimple_uid (stmt) != uid) +return stmt1; + else if (stmt == stmt2) +return stmt2; +} + gcc_unreachable (); +} + /* Find the statement after which STMT must be moved so that the dependency from DEP_STMT to STMT is maintained. */ @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt) gimple insert_stmt = stmt; if (dep_stmt == NULL) return stmt; - if (not_dominated_by (insert_stmt, dep_stmt)) + if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt) + && gimple_bb (insert_stmt) == gimple_bb (dep_stmt) + && insert_stmt != dep_stmt) +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt); + else if (not_dominated_by (insert_stmt, dep_stmt)) insert_stmt = dep_stmt; return insert_stmt; }
Re: PR tree-optimization/57337
In that case, if my insert_stmt immediately follows dep_stmt and both have the same UID, not_dominated_by would return true and I will end up updating insert_stmt to dep_stmt which is wrong. - Easwaran On Fri, May 24, 2013 at 1:07 AM, Richard Biener wrote: > On Thu, May 23, 2013 at 7:26 PM, Easwaran Raman wrote: >> This addresses the case where UID alone is not sufficient to figure >> out which statement appears earlier in a BB. Bootstraps and no test >> regressions in x86_64 on linux. Ok for trunk? > > Why not simply conservatively use gimple_uid (a) <= gimple_uid (b) > in not_dominated_by? > > Richard. > > > >> Thanks, >> Easwaran >> >> >> 2013-05-23 Easwaran Raman >> >> PR tree-optimization/57337 >> * tree-ssa-reassoc.c (appears_later_in_bb): New function. >> (find_insert_point): Correctly identify the insertion point >> when two statements with the same UID is compared. >> >> Index: gcc/tree-ssa-reassoc.c >> === >> --- gcc/tree-ssa-reassoc.c (revision 199211) >> +++ gcc/tree-ssa-reassoc.c (working copy) >> @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b) >> >> } >> >> +/* Among STMT1 and STMT2, return the statement that appears later. Both >> + statements are in same BB and have the same UID. */ >> + >> +static gimple >> +appears_later_in_bb (gimple stmt1, gimple stmt2) >> +{ >> + unsigned uid = gimple_uid (stmt1); >> + gimple_stmt_iterator gsi = gsi_for_stmt (stmt1); >> + gsi_next (&gsi); >> + if (gsi_end_p (gsi)) >> +return stmt1; >> + for (; !gsi_end_p (gsi); gsi_next (&gsi)) >> +{ >> + gimple stmt = gsi_stmt (gsi); >> + >> + /* If STMT has a different UID than STMT1 and we haven't seen >> + STMT2 during traversal, we know STMT1 appears later. */ >> + if (gimple_uid (stmt) != uid) >> +return stmt1; >> + else if (stmt == stmt2) >> +return stmt2; >> +} >> + gcc_unreachable (); >> +} >> + >> /* Find the statement after which STMT must be moved so that the >> dependency from DEP_STMT to STMT is maintained. */ >> >> @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt) >>gimple insert_stmt = stmt; >>if (dep_stmt == NULL) >> return stmt; >> - if (not_dominated_by (insert_stmt, dep_stmt)) >> + if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt) >> + && gimple_bb (insert_stmt) == gimple_bb (dep_stmt) >> + && insert_stmt != dep_stmt) >> +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt); >> + else if (not_dominated_by (insert_stmt, dep_stmt)) >> insert_stmt = dep_stmt; >>return insert_stmt; >> }
Re: PR tree-optimization/57337
On Sat, May 25, 2013 at 4:46 AM, Richard Biener wrote: > Easwaran Raman wrote: > >>In that case, if my insert_stmt immediately follows dep_stmt and both >>have the same UID, not_dominated_by would return true and I will end >>up updating insert_stmt to dep_stmt which is wrong. > > But there should be a safe default answer for > Equal uids. Unless we are asking different questions in different places. > Thus, I do not like the stmt walking but rather have a safe fallback. I am lost here. I don't see how we could avoid doing the stmt walking to resolve the equal uid case. How to ensure that not_dominated_by (a, b) returns true and not_dominated_by (b, a) returns false if A and B have the same UID and A appears before B without doing the statement walk. And, I don't see why the statement walk is bad. It is not likely that there is a long sequence of statements with the same UID. Thanks, Easwaran > > Richard. > >>- Easwaran >> >>On Fri, May 24, 2013 at 1:07 AM, Richard Biener >> wrote: >>> On Thu, May 23, 2013 at 7:26 PM, Easwaran Raman >>wrote: >>>> This addresses the case where UID alone is not sufficient to figure >>>> out which statement appears earlier in a BB. Bootstraps and no test >>>> regressions in x86_64 on linux. Ok for trunk? >>> >>> Why not simply conservatively use gimple_uid (a) <= gimple_uid (b) >>> in not_dominated_by? >>> >>> Richard. >>> >>> >>> >>>> Thanks, >>>> Easwaran >>>> >>>> >>>> 2013-05-23 Easwaran Raman >>>> >>>> PR tree-optimization/57337 >>>> * tree-ssa-reassoc.c (appears_later_in_bb): New function. >>>> (find_insert_point): Correctly identify the insertion point >>>> when two statements with the same UID is compared. >>>> >>>> Index: gcc/tree-ssa-reassoc.c >>>> === >>>> --- gcc/tree-ssa-reassoc.c (revision 199211) >>>> +++ gcc/tree-ssa-reassoc.c (working copy) >>>> @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b) >>>> >>>> } >>>> >>>> +/* Among STMT1 and STMT2, return the statement that appears later. >>Both >>>> + statements are in same BB and have the same UID. */ >>>> + >>>> +static gimple >>>> +appears_later_in_bb (gimple stmt1, gimple stmt2) >>>> +{ >>>> + unsigned uid = gimple_uid (stmt1); >>>> + gimple_stmt_iterator gsi = gsi_for_stmt (stmt1); >>>> + gsi_next (&gsi); >>>> + if (gsi_end_p (gsi)) >>>> +return stmt1; >>>> + for (; !gsi_end_p (gsi); gsi_next (&gsi)) >>>> +{ >>>> + gimple stmt = gsi_stmt (gsi); >>>> + >>>> + /* If STMT has a different UID than STMT1 and we haven't seen >>>> + STMT2 during traversal, we know STMT1 appears later. */ >>>> + if (gimple_uid (stmt) != uid) >>>> +return stmt1; >>>> + else if (stmt == stmt2) >>>> +return stmt2; >>>> +} >>>> + gcc_unreachable (); >>>> +} >>>> + >>>> /* Find the statement after which STMT must be moved so that the >>>> dependency from DEP_STMT to STMT is maintained. */ >>>> >>>> @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple >>dep_stmt) >>>>gimple insert_stmt = stmt; >>>>if (dep_stmt == NULL) >>>> return stmt; >>>> - if (not_dominated_by (insert_stmt, dep_stmt)) >>>> + if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt) >>>> + && gimple_bb (insert_stmt) == gimple_bb (dep_stmt) >>>> + && insert_stmt != dep_stmt) >>>> +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt); >>>> + else if (not_dominated_by (insert_stmt, dep_stmt)) >>>> insert_stmt = dep_stmt; >>>>return insert_stmt; >>>> } > >
Re: PR tree-optimization/57337
On Tue, May 28, 2013 at 4:11 AM, Richard Biener wrote: > On Mon, May 27, 2013 at 10:20 AM, Richard Biener > wrote: >> On Sun, May 26, 2013 at 5:53 AM, Easwaran Raman wrote: >>> On Sat, May 25, 2013 at 4:46 AM, Richard Biener >>> wrote: >>>> Easwaran Raman wrote: >>>> >>>>>In that case, if my insert_stmt immediately follows dep_stmt and both >>>>>have the same UID, not_dominated_by would return true and I will end >>>>>up updating insert_stmt to dep_stmt which is wrong. >>>> >>>> But there should be a safe default answer for >>>> Equal uids. Unless we are asking different questions in different places. >>>> Thus, I do not like the stmt walking but rather have a safe fallback. >>> >>> I am lost here. I don't see how we could avoid doing the stmt walking >>> to resolve the equal uid case. How to ensure that not_dominated_by (a, >>> b) returns true and not_dominated_by (b, a) returns false if A and B >>> have the same UID and A appears before B without doing the statement >>> walk. And, I don't see why the statement walk is bad. It is not likely >>> that there is a long sequence of statements with the same UID. >> >> Sure, but if you are always asking a question like "is placing X before Y >> ok?" >> then you can conservatively answer "no" and code should handle that ok. >> If you are asking questions both way then of course no conservative answer is >> possible. Both current uses of not_dominated_by are of the same kind, >> if the placement is not ok then either the insert point needs adjustment or >> the debug stmt reset. > > Ok, thinking about this more I come to the conclusion that a safe default for > equal UIDs is not possible as we may be not able to order two dep_stmts. > I still dislike walking though, but to get the quite frequent regressions > fixed > the patch is ok as-is. > > Please install. > > Thanks, > Richard. Thanks. Submitted the patch at r199385. - Easwaran
Fix PR 57442
I made a thinko when I asserted gcc_unreachable outside the main loop of appears_later_in_bb in my previous fix to PR 57337. It is quite possible for STMT1 to be followed by a one or more statements with the same UID till the end of the BB without encountering STMT2. Right fix is to return STMT1 outside the loop. Ok for trunk? Thanks, Easwaran 2013-05-28 Easwaran Raman PR tree-optimization/57442 * tree-ssa-reassoc.c (appears_later_in_bb): Return correct value when control exits the main loop. 2013-05-28 Easwaran Raman PR tree-optimization/57442 * gcc.dg/tree-ssa/reassoc-30.c: New testcase. Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c === --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0) @@ -0,0 +1,13 @@ +/* PR tree-optimization/57442 */ +/* { dg-do compile } */ +/* { dg-options "-O1" } */ +short a; +unsigned b; +long c; +int d; + +void f(void) +{ +b = a ? : (a = b) - c + (d - (b + b)); +} + Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 199385) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2888,7 +2888,7 @@ appears_later_in_bb (gimple stmt1, gimple stmt2) else if (stmt == stmt2) return stmt2; } - gcc_unreachable (); + return stmt1; } /* Find the statement after which STMT must be moved so that the
[google gcc-4_8] Change size accounting during inlining in lipo mode
In lipo mode, this patch updates the overall unit size only when the eventual function to which the callee is inlined is in primary module. This is to avoid the situation where the module growth budget is used up by inlines into auxiliary module functions that never get inlined into some primary module function. Ok for google/gcc-4_8 branch? Thanks, Easwaran lipo_inline.patch Description: Binary data
Fix PR middle-end/57370
A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options "-O2 -ffast-math" } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, & + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, & + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446 - t5449 - & +t5451 - t5454 - t5456 + t5459 - & +t5462 + t5466 - t5468 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 +t5489 = 0.16e2_dp * t1429 * t1658 +t5531 = 0.160e2_dp * t112 * t1626 +t5533 = 0.160e2_dp * t112 * t1632 +t5537 = 0.160e2_dp * t112 * t1622 +t5541 = t5472 - t5478 - t5523 + t5525 + & +t5531 + t5533 + t5535 + t5537 + & +t5540 +t5565 = t112 * t1685 +t5575 = t5545 - t5548 + t5551 + t5553 - & +t5558 + t5560 - t5562 + t5564 - & +0.80e1_dp * t5565 + t5568 + t5572 + & +t5574 +t5611 = t5579 - t5585 + t5590 - t5595 + & +t5597 - t5602 + t5604 + t5607 + & +t5610 +t5613 = t5469 + t5541 + t5575 + t5611 +t6223 = t6189 - & +0.36e0_dp * t83 * t84 * t5613 + & +t6222 +t6227 = - t8 * (t5305 + t6223) +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + & + t6227 * sx +t6352 = t5440 - t5443 - t5446 - t5449 - & +t5451 - t5454 + & +0.40e1_dp * t102 * t327 * t2057 * t557 - & +t5456 + t5459 - t5462 + t5466 - & +t5468 +t6363 = t5480 - t5489 + & +0.96e2_dp * t1054 * t640 * t3679 +t6367 = t5472 - t5474 - t5478 - t5523 + & +t5525 + t5531 + t5533 + t5535 + & +t5537 - 0.20e1_dp * t102 * t105 * t6363 + & +t5540 +t6370 = t5545 - t5548 + t5551 + t5553 - & +t5558 + t5560 - t5562 + t5564 - & +0.40e1_dp * t5565 + & +t5568 + t5572 + t5574 +t6373 = t5579 - t5585 + t5590 - t5595 + & +t5597 - t5602 + t5604 + t5607 + & +t5610 +t6375 = t6352 + t6367 + t6370 + t6373 +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701 +t6669 = -t4704 - t8 * (t6344 + t6380 + t6665) +e_ndrho_ndrho_ndrho(ii) = e_ndrho_ndrho_ndrho(ii) + & + t6669 * sx + END IF + END DO + END SUBROUTINE xb88_lr_adiabatic_lda_calc + Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c(revision 200429) +++ gcc/tree-ssa-reassoc.c(working copy) @@ -1207,7 +1207,7 @@ build_and_add_sum (tree type, tree op1, tree op2, if (gimple_code (op1def) == GIMPLE_PHI) { gsi = gsi_after_labels (gimple_bb (op1def)); - gimple_set_uid (sum, gimple_uid (op1def)); + gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi))); gsi_insert_before (&gsi, sum, GSI_NEW_STMT); } else
Re: Fix PR middle-end/57370
Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman wrote: > A newly generated statement in build_and_add_sum function of > tree-ssa-reassoc.c has to be assigned the UID of its adjacent > statement. In one instance, it was assigned the wrong uid (of an > earlier phi statement) which messed up the IR and caused the test > program to hang. Bootstraps and no test regressions on x86_64/linux. > Ok for trunk? > > Thanks, > Easwaran > > > 2013-06-27 Easwaran Raman > > PR middle-end/57370 > * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi > node for a non-phi gimple statement. > > testsuite/ChangeLog: > 2013-06-27 Easwaran Raman > > PR middle-end/57370 > * gfortran.dg/reassoc_12.f90: New testcase. > > > Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 > === > --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) > +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) > @@ -0,0 +1,74 @@ > +! { dg-do compile } > +! { dg-options "-O2 -ffast-math" } > +! PR middle-end/57370 > + > + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, & > + grad_deriv,npoints, sx) > +IMPLICIT REAL*8 (t) > +INTEGER, PARAMETER :: dp=8 > +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, & > + e_ndrho_ndrho_rho > + DO ii=1,npoints > + IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN > +t1425 = t233 * t557 > +t1429 = beta * t225 > +t1622 = t327 * t1621 > +t1626 = t327 * t1625 > +t1632 = t327 * t1631 > +t1685 = t105 * t1684 > +t2057 = t1636 + t8 * (t2635 + t3288) > + END IF > + IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN > +t5469 = t5440 - t5443 - t5446 - t5449 - & > +t5451 - t5454 - t5456 + t5459 - & > +t5462 + t5466 - t5468 > +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 > +t5489 = 0.16e2_dp * t1429 * t1658 > +t5531 = 0.160e2_dp * t112 * t1626 > +t5533 = 0.160e2_dp * t112 * t1632 > +t5537 = 0.160e2_dp * t112 * t1622 > +t5541 = t5472 - t5478 - t5523 + t5525 + & > +t5531 + t5533 + t5535 + t5537 + & > +t5540 > +t5565 = t112 * t1685 > +t5575 = t5545 - t5548 + t5551 + t5553 - & > +t5558 + t5560 - t5562 + t5564 - & > +0.80e1_dp * t5565 + t5568 + t5572 + & > +t5574 > +t5611 = t5579 - t5585 + t5590 - t5595 + & > +t5597 - t5602 + t5604 + t5607 + & > +t5610 > +t5613 = t5469 + t5541 + t5575 + t5611 > +t6223 = t6189 - & > +0.36e0_dp * t83 * t84 * t5613 + & > +t6222 > +t6227 = - t8 * (t5305 + t6223) > +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + & > + t6227 * sx > +t6352 = t5440 - t5443 - t5446 - t5449 - & > +t5451 - t5454 + & > +0.40e1_dp * t102 * t327 * t2057 * t557 - & > +t5456 + t5459 - t5462 + t5466 - & > +t5468 > +t6363 = t5480 - t5489 + & > +0.96e2_dp * t1054 * t640 * t3679 > +t6367 = t5472 - t5474 - t5478 - t5523 + & > +t5525 + t5531 + t5533 + t5535 + & > +t5537 - 0.20e1_dp * t102 * t105 * t6363 + & > +t5540 > +t6370 = t5545 - t5548 + t5551 + t5553 - & > +t5558 + t5560 - t5562 + t5564 - & > +0.40e1_dp * t5565 + & > +t5568 + t5572 + t5574 > +t6373 = t5579 - t5585 + t5590 - t5595 + & > +t5597 - t5602 + t5604 + t5607 + & > +t5610 > +t6375 = t6352 + t6367 + t6370 + t6373 > +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701 > +t6669 = -t4704 - t8 * (t6344 + t6380 + t6665) > +e_ndrho_ndrho_ndrho(ii) = e_ndrho_ndrho_ndrho(ii) + & > + t6669 * sx > + END IF > + END DO > + END SUBROUTINE xb88_lr_adiabatic_lda_calc > + > Index: gcc/tree-ssa-reassoc.c >
Re: Fix PR middle-end/57370
I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. testsuite/ChangeLog: 2013-07-31 Easwaran Raman PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman wrote: > Ping. > > On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman wrote: >> A newly generated statement in build_and_add_sum function of >> tree-ssa-reassoc.c has to be assigned the UID of its adjacent >> statement. In one instance, it was assigned the wrong uid (of an >> earlier phi statement) which messed up the IR and caused the test >> program to hang. Bootstraps and no test regressions on x86_64/linux. >> Ok for trunk? >> >> Thanks, >> Easwaran >> >> >> 2013-06-27 Easwaran Raman >> >> PR middle-end/57370 >> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi >> node for a non-phi gimple statement. >> >> testsuite/ChangeLog: >> 2013-06-27 Easwaran Raman >> >> PR middle-end/57370 >> * gfortran.dg/reassoc_12.f90: New testcase. >> >> >> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 >> === >> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >> @@ -0,0 +1,74 @@ >> +! { dg-do compile } >> +! { dg-options "-O2 -ffast-math" } >> +! PR middle-end/57370 >> + >> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, & >> + grad_deriv,npoints, sx) >> +IMPLICIT REAL*8 (t) >> +INTEGER, PARAMETER :: dp=8 >> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, & >> + e_ndrho_ndrho_rho >> + DO ii=1,npoints >> + IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN >> +t1425 = t233 * t557 >> +t1429 = beta * t225 >> +t1622 = t327 * t1621 >> +t1626 = t327 * t1625 >> +t1632 = t327 * t1631 >> +t1685 = t105 * t1684 >> +t2057 = t1636 + t8 * (t2635 + t3288) >> + END IF >> + IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN >> +t5469 = t5440 - t5443 - t5446 - t5449 - & >> +t5451 - t5454 - t5456 + t5459 - & >> +t5462 + t5466 - t5468 >> +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 >> +t5489 = 0.16e2_dp * t1429 * t1658 >> +t5531 = 0.160e2_dp * t112 * t1626 >> +t5533 = 0.160e2_dp * t112 * t1632 >> +t5537 = 0.160e2_dp * t112 * t1622 >> +t5541 = t5472 - t5478 - t5523 + t5525 + & >> +t5531 + t5533 + t5535 + t5537 + & >> +t5540 >> +t5565 = t112 * t1685 >> +t5575 = t5545 - t5548 + t5551 + t5553 - & >> +t5558 + t5560 - t5562 + t5564 - & >> +0.80e1_dp * t5565 + t5568 + t5572 + & >> +t5574 >> +t5611 = t5579 - t5585 + t5590 - t5595 + & >> +t5597 - t5602 + t5604 + t5607 + & >> +t5610 >> +t5613 = t5469 + t5541 + t5575 + t5611 >> +t6223 = t6189 - & >> +0.36e0_dp * t83 * t84 * t5613 + & >> +t6222 >> +t6227 = - t8 * (t5305 + t6223) >> +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + & >> + t6227 * sx >> +t6352 = t5440 - t5443 - t5446 - t5449 - & >> +t5451 - t5454 + & >> +0.40e1_dp * t102 * t327 * t2057 * t557 - & >> +t5456 + t5459 - t5462 + t5466 - & >> +t5468 >> +t6363 = t5480 - t5489 + & >> +0.96e2_dp * t1054 * t640 * t3679 >> +t6367 = t5472 - t5474 - t5478 - t5523 + &
Re: Fix PR middle-end/57370
Ping. On Wed, Jul 31, 2013 at 4:34 PM, Easwaran Raman wrote: > I have a new patch that supersedes this. The new patch also fixes PR > tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and > no test regression on x86_64/linux. Ok for trunk? > > 2013-07-31 Easwaran Raman > > PR middle-end/57370 > * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset > of debug statements that cause inconsistent IR. > > > testsuite/ChangeLog: > 2013-07-31 Easwaran Raman > > PR middle-end/57370 > PR tree-optimization/57393 > PR tree-optimization/58011 > * gfortran.dg/reassoc_12.f90: New testcase. > * gcc.dg/tree-ssa/reassoc-31.c: New testcase. > * gcc.dg/tree-ssa/reassoc-31.c: New testcase. > > > On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman wrote: >> Ping. >> >> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman wrote: >>> A newly generated statement in build_and_add_sum function of >>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent >>> statement. In one instance, it was assigned the wrong uid (of an >>> earlier phi statement) which messed up the IR and caused the test >>> program to hang. Bootstraps and no test regressions on x86_64/linux. >>> Ok for trunk? >>> >>> Thanks, >>> Easwaran >>> >>> >>> 2013-06-27 Easwaran Raman >>> >>> PR middle-end/57370 >>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a >>> phi >>> node for a non-phi gimple statement. >>> >>> testsuite/ChangeLog: >>> 2013-06-27 Easwaran Raman >>> >>> PR middle-end/57370 >>> * gfortran.dg/reassoc_12.f90: New testcase. >>> >>> >>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 >>> === >>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>> @@ -0,0 +1,74 @@ >>> +! { dg-do compile } >>> +! { dg-options "-O2 -ffast-math" } >>> +! PR middle-end/57370 >>> + >>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, & >>> + grad_deriv,npoints, sx) >>> +IMPLICIT REAL*8 (t) >>> +INTEGER, PARAMETER :: dp=8 >>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, & >>> + e_ndrho_ndrho_rho >>> + DO ii=1,npoints >>> + IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN >>> +t1425 = t233 * t557 >>> +t1429 = beta * t225 >>> +t1622 = t327 * t1621 >>> +t1626 = t327 * t1625 >>> +t1632 = t327 * t1631 >>> +t1685 = t105 * t1684 >>> +t2057 = t1636 + t8 * (t2635 + t3288) >>> + END IF >>> + IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN >>> +t5469 = t5440 - t5443 - t5446 - t5449 - & >>> +t5451 - t5454 - t5456 + t5459 - & >>> +t5462 + t5466 - t5468 >>> +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 >>> +t5489 = 0.16e2_dp * t1429 * t1658 >>> +t5531 = 0.160e2_dp * t112 * t1626 >>> +t5533 = 0.160e2_dp * t112 * t1632 >>> +t5537 = 0.160e2_dp * t112 * t1622 >>> +t5541 = t5472 - t5478 - t5523 + t5525 + & >>> +t5531 + t5533 + t5535 + t5537 + & >>> +t5540 >>> +t5565 = t112 * t1685 >>> +t5575 = t5545 - t5548 + t5551 + t5553 - & >>> +t5558 + t5560 - t5562 + t5564 - & >>> +0.80e1_dp * t5565 + t5568 + t5572 + & >>> +t5574 >>> +t5611 = t5579 - t5585 + t5590 - t5595 + & >>> +t5597 - t5602 + t5604 + t5607 + & >>> +t5610 >>> +t5613 = t5469 + t5541 + t5575 + t5611 >>> +t6223 = t6189 - & >>> +0.36e0_dp * t83 * t84 * t5613 + & >>> +t6222 >>> +t6227 = - t8 * (t5305 + t6223) >>> +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + & >>> +
Re: Minimize downward code motion during reassociation
I want to resend this patch for consideration. I applied the patch to trunk and confirmed that it bootstraps and doesn't cause test regressions. Is this ok for trunk? Thanks, Easwaran On Fri, Dec 7, 2012 at 12:01 PM, Easwaran Raman wrote: > It seems I need to reset the debug uses of a statement before moving > the statement itself. The attached patch starts from the leaf to root > of the tree to be reassociated and places them at the point where > their dependences will be met after reassociation. This bootstraps and > I am running the tests. Ok if there are no test failures? > > Thanks, > Easwaran > > 2012-12-07 Easwaran Raman > * tree-ssa-reassoc.c(find_insert_point): New function. > (insert_stmt_after): Likewise. > (get_def_stmt): Likewise. > (ensure_ops_are_available): Likewise. > (rewrite_expr_tree): Do not move statements beyond what is > necessary. Remove call to swap_ops_for_binary_stmt... > (reassociate_bb): ... and move it here. > (build_and_add_sum): Assign UIDs for new statements. > (linearize_expr): Likewise. > (do_reassoc): Renumber gimple statement UIDs. > > > > On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener > wrote: >> On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman wrote: >>> I am unable to figure out the right way to handle the debug >>> statements. What I tried was to find debug statements that use the SSA >>> name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE) >>> and then moved them as well at the right place. Thus, if I have to >>> move t1 = a + b down (after the definition of 'd'), I also moved all >>> debug statements that use t1 after the new position of t1. That still >>> caused use-before-def problems in ssa_verify. I noticed that the debug >>> statements got modified behind the scenes causing these issues. Any >>> hints on what is the right way to handle the debug statements would be >>> very helpful. >> >> I think you cannot (and should not) move debug statements. Instead you >> have to invalidate them. Otherwise you'll introduce confusion as debug >> info cannot handle overlapping live ranges. >> >> But maybe Alex can clarify. >> >> Richard.
Change the ordering of cdce pass
The conditional dead call elimination pass shrink wraps certain dead calls to math functions. It doesn't handle case like this: D.142420_139 = powD.549 (D.142421_138, D.142419_132); fooD.120935.barD.113815 = D.142420_139; # foo.bar is dead here. This code gets cleaned up by DCE and leaves only pow, which can then be shrink-wrapped by cdce. So it seems reasonable to do this reordering. Bootstraps on x86_64 on linux with no test regression. OK for trunk? - Easwaran -- 2012-06-14 Easwaran Raman * gcc/passes.c (init_optimization_passes): Remove pass_call_cdce from its current position and insert after pass_dce. Index: gcc/passes.c === --- gcc/passes.c(revision 188535) +++ gcc/passes.c(working copy) @@ -1374,7 +1374,6 @@ init_optimization_passes (void) NEXT_PASS (pass_complete_unrolli); NEXT_PASS (pass_ccp); NEXT_PASS (pass_forwprop); - NEXT_PASS (pass_call_cdce); /* pass_build_alias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. Re-building alias information also rewrites no longer addressed @@ -1387,6 +1386,7 @@ init_optimization_passes (void) NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_vrp); NEXT_PASS (pass_dce); + NEXT_PASS (pass_call_cdce); NEXT_PASS (pass_cselim); NEXT_PASS (pass_tree_ifcombine); NEXT_PASS (pass_phiopt);
Re: Change the ordering of cdce pass
ChangeLog entry has a gcc/ prefix that shouldn't be there. Here is the revised entry: 2012-06-14 Easwaran Raman * passes.c (init_optimization_passes): Remove pass_call_cdce from its current position and insert after pass_dce. On Thu, Jun 14, 2012 at 6:38 PM, Easwaran Raman wrote: > The conditional dead call elimination pass shrink wraps certain dead > calls to math functions. It doesn't handle case like this: > > D.142420_139 = powD.549 (D.142421_138, D.142419_132); > fooD.120935.barD.113815 = D.142420_139; > # foo.bar is dead here. > > This code gets cleaned up by DCE and leaves only pow, which can then > be shrink-wrapped by cdce. So it seems reasonable to do this > reordering. Bootstraps on x86_64 on linux with no test regression. OK > for trunk? > > - Easwaran > > -- > > 2012-06-14 Easwaran Raman > > * gcc/passes.c (init_optimization_passes): Remove pass_call_cdce > from its current position and insert after pass_dce. > > Index: gcc/passes.c > === > --- gcc/passes.c (revision 188535) > +++ gcc/passes.c (working copy) > @@ -1374,7 +1374,6 @@ init_optimization_passes (void) > NEXT_PASS (pass_complete_unrolli); > NEXT_PASS (pass_ccp); > NEXT_PASS (pass_forwprop); > - NEXT_PASS (pass_call_cdce); > /* pass_build_alias is a dummy pass that ensures that we > execute TODO_rebuild_alias at this point. Re-building > alias information also rewrites no longer addressed > @@ -1387,6 +1386,7 @@ init_optimization_passes (void) > NEXT_PASS (pass_merge_phi); > NEXT_PASS (pass_vrp); > NEXT_PASS (pass_dce); > + NEXT_PASS (pass_call_cdce); > NEXT_PASS (pass_cselim); > NEXT_PASS (pass_tree_ifcombine); > NEXT_PASS (pass_phiopt);
Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)
+static inline hashval_t +edge_hash_function (unsigned int id1, unsigned int id2) +{ + /* If the number of functions is less than 1000, this gives a unique value + for every function id combination. */ + const int MULTIPLIER = 1000; + return id1* MULTIPLIER + id2; Change to id1 << 16 | id2 + if (edge_map == NULL) + { + edge_map = htab_create (25, edge_map_htab_hash_descriptor, + edge_map_htab_eq_descriptor , NULL); Use a larger size and don't hardcode the value. + if (function_map == NULL) + { + function_map = htab_create (25, function_map_htab_hash_descriptor, + function_map_htab_eq_descriptor , NULL); Use a larger size and don't hardcode the value. + caller_node = get_function_node (caller); + /* Real nodes are nodes that correspond to .text sections found. These will + definitely be sorted. */ + set_as_real_node (caller_node); This is redundant as set_node_type sets the type correctly. + if (*slot == NULL) + { + reset_functions (edge, new_node, kept_node); + *slot = edge; + add_edge_to_node (new_node, edge); edge will still be in old_node's edge_list +static void set_node_type (Node *n) +{ + void **slot; + char *name = n->name; + slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name), + INSERT); Use htab_find_with_hash or pass NOINSERT. + /* Allocate section_map. */ + if (section_map == NULL) + { + section_map = htab_create (10, section_map_htab_hash_descriptor, + section_map_htab_eq_descriptor , NULL); With -ffunction-sections, this should be roughly equal to size of function_map. +static void +write_out_node (FILE *fp, char *name, + void **handles, unsigned int *shndx, int position) +{ + void **slot; + slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name), + INSERT); Use htab_find_with_hash or pass NOINSERT. Index: function_reordering_plugin/function_reordering_plugin.c === --- function_reordering_plugin/function_reordering_plugin.c (revision 0) +++ function_reordering_plugin/function_reordering_plugin.c (revision 0) + parse_callgraph_section_contents (section_contents, (unsigned int)length); +} + else if (name != NULL) +free (name); name should be freed in the body of then and else-if as well. -Easwaran On Mon, Sep 26, 2011 at 5:22 PM, Sriraman Tallam wrote: > > *Resending as plain text* > > I am attaching a patch of the updated files. This patch was meant for > the google gcc-4_6 branch. Sorry for not mentioning this upfront in my > original mail. > Thanks, > -Sri. > > > > > On Mon, Sep 26, 2011 at 9:53 AM, Sriraman Tallam > > wrote: > >> > >> On Mon, Sep 26, 2011 at 9:22 AM, Nathan Froyd wrote: > >> > On 9/23/2011 6:03 PM, Sriraman Tallam wrote: > >> >> > >> >> This patch adds a new linker plugin to re-order functions. > >> > > >> > This is great stuff. We were experimenting with using the coverage > >> > files to > >> > generate an ordering for --section-ordering-file, but this might be even > >> > better, will have to experiment with it. > >> > > >> > A couple of comments on the code itself: > >> > > >> >> Index: function_reordering_plugin/callgraph.h > >> >> +inline static Edge_list * > >> >> +make_edge_list (Edge *e) > >> >> +{ > >> >> + Edge_list *list = (Edge_list *)malloc (sizeof (Edge_list)); > >> > > >> > If you are going to use libiberty via hashtab.h, you might as well make > >> > use > >> > of the *ALLOC family of macros to make this and other allocations a > >> > little > >> > neater. > >> > > >> >> +/*Represents an edge in the call graph. */ > >> >> +struct __edge__ > >> > > >> > I think the usual convention is to call this edge_d or something similar, > >> > avoiding the profusion of underscores. > >> > > >> >> +void > >> >> +map_section_name_to_index (char *section_name, void *handle, int shndx) > >> >> +{ > >> >> + void **slot; > >> >> + char *function_name; > >> >> + > >> >> + if (is_prefix_of (".text.hot.", section_name)) > >> >> + function_name = section_name + 10 /* strlen (".text.hot.") */; > >> >> + else if (is_prefix_of (".text.unlikely.", section_name)) > >> >> + function_name = section_name + 15 /* strlen (".text.unlikely.") */; > >> >> + else if (is_prefix_of (".text.cold.", section_name)) > >> >> + function_name = section_name + 11 /* strlen (".text.cold.") */; > >> >> + else if (is_prefix_of (".text.startup.", section_name)) > >> >> + function_name = section_name + 14 /* strlen (".text.startup.") */; > >> >> + else > >> >> + function_name = section_name + 6 /*strlen (".text.") */; > >> > > >> > You don't handle plain .text; this is assuming -ffunction-sections, > >> > right? > >> > Can this limitation be easily removed? > >> > >> You need to compile with -ffunction-sections or the individual > >> sections cannot be re-ordered. That is why I assumed a .text.* prefix > >> before every function sec
Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)
OK for google/gcc-4_6 and google/main branches. -Easwaran > > On Tue, Sep 27, 2011 at 4:07 PM, Sriraman Tallam wrote: >> >> Made all the changes. Attaching new patch of updated files. >> >> On Tue, Sep 27, 2011 at 11:26 AM, Easwaran Raman wrote: >> > >> > +static inline hashval_t >> > +edge_hash_function (unsigned int id1, unsigned int id2) >> > +{ >> > + /* If the number of functions is less than 1000, this gives a unique >> > value >> > + for every function id combination. */ >> > + const int MULTIPLIER = 1000; >> > + return id1* MULTIPLIER + id2; >> > >> > Change to id1 << 16 | id2 >> > >> > + if (edge_map == NULL) >> > + { >> > + edge_map = htab_create (25, edge_map_htab_hash_descriptor, >> > + edge_map_htab_eq_descriptor , NULL); >> > >> > Use a larger size and don't hardcode the value. >> > >> > + if (function_map == NULL) >> > + { >> > + function_map = htab_create (25, function_map_htab_hash_descriptor, >> > + function_map_htab_eq_descriptor , NULL); >> > >> > Use a larger size and don't hardcode the value. >> > >> > + caller_node = get_function_node (caller); >> > + /* Real nodes are nodes that correspond to .text sections found. These >> > will >> > + definitely be sorted. */ >> > + set_as_real_node (caller_node); >> > >> > This is redundant as set_node_type sets the type correctly. >> > >> > + if (*slot == NULL) >> > + { >> > + reset_functions (edge, new_node, kept_node); >> > + *slot = edge; >> > + add_edge_to_node (new_node, edge); >> > >> > edge will still be in old_node's edge_list >> >> I do not have to explicitly delete this as the merged node's edge_list >> will never be traversed except when cleaning up. >> >> > >> > +static void set_node_type (Node *n) >> > +{ >> > + void **slot; >> > + char *name = n->name; >> > + slot = htab_find_slot_with_hash (section_map, name, htab_hash_string >> > (name), >> > + INSERT); >> > >> > Use htab_find_with_hash or pass NOINSERT. >> > >> > + /* Allocate section_map. */ >> > + if (section_map == NULL) >> > + { >> > + section_map = htab_create (10, section_map_htab_hash_descriptor, >> > + section_map_htab_eq_descriptor , NULL); >> > >> > With -ffunction-sections, this should be roughly equal to size of >> > function_map. >> > >> > +static void >> > +write_out_node (FILE *fp, char *name, >> > + void **handles, unsigned int *shndx, int position) >> > +{ >> > + void **slot; >> > + slot = htab_find_slot_with_hash (section_map, name, htab_hash_string >> > (name), >> > + INSERT); >> > >> > Use htab_find_with_hash or pass NOINSERT. >> > >> > Index: function_reordering_plugin/function_reordering_plugin.c >> > === >> > --- function_reordering_plugin/function_reordering_plugin.c (revision >> > 0) >> > +++ function_reordering_plugin/function_reordering_plugin.c (revision >> > 0) >> > >> > + parse_callgraph_section_contents (section_contents, >> > (unsigned int)length); >> > + } >> > + else if (name != NULL) >> > + free (name); >> > >> > name should be freed in the body of then and else-if as well. >> > >> > -Easwaran >> > On Mon, Sep 26, 2011 at 5:22 PM, Sriraman Tallam >> > wrote: >> > > >> > > *Resending as plain text* >> > > >> > > I am attaching a patch of the updated files. This patch was meant for >> > > the google gcc-4_6 branch. Sorry for not mentioning this upfront in my >> > > original mail. >> > > Thanks, >> > > -Sri. >> > > >> > > >> > > >> > > > On Mon, Sep 26, 2011 at 9:53 AM, Sriraman Tallam >> > > > wrote: >> > > >> >> > > >> On Mon, Sep 26, 2011 at 9:22 AM, Nathan Froyd >> > > >> wrote: >> > > >> > On 9/23/2011 6:03 PM, Sriraman Tallam wrote: >> > > >> >> >> > > >> >> This patch adds a new li
[google] Fix bugs in sampled profile collection
This fixes two issues with sampled profile collection. It delays cleanup of instrumentation_to_be_sampled after all callgraph nodes have been instrumented and prevents gcov_sample_counter_decl and gcov_sampling_rate_decl from being garbage collected. Ok for google/gcc-4_6 and google/main branches? -Easwaran 2011-09-30 Easwaran Raman * tree-profile.c (gcov_sample_counter_decl): Add GTY marker. (gcov_sampling_rate_decl): Likewise. (add_sampling_to_edge_counters): Do not free instrumentation_to_be_sampled. (cleanup_instrumentation_sampling): New function. (tree_profiling): Call cleanup_instrumentation_sampling at the end. testsuite/ChangeLog.google-4_6: 2011-09-30 Easwaran Raman * gcc.dg/sample-profile-generate-1.c: New test. Index: gcc/testsuite/gcc.dg/sample-profile-generate-1.c === --- gcc/testsuite/gcc.dg/sample-profile-generate-1.c(revision 0) +++ gcc/testsuite/gcc.dg/sample-profile-generate-1.c(revision 0) @@ -0,0 +1,26 @@ +/* { dg-do compile} */ +/* { dg-options "-O2 -fprofile-generate -fprofile-generate-sampling" } */ + +void foobar(int); + +void +foo (void) +{ + int i; + for (i = 0; i < 100; i++) +{ + foobar(i); +} +} + +void +bar (void) +{ + int i; + for (i = 0; i < 100; i++) +{ + foobar(i); +} +} + +/* { dg-final { cleanup-coverage-files } } */ Index: tree-profile.c === --- tree-profile.c (revision 178897) +++ tree-profile.c (working copy) @@ -163,10 +163,10 @@ init_ic_make_global_vars (void) static struct pointer_set_t *instrumentation_to_be_sampled = NULL; /* extern __thread gcov_unsigned_t __gcov_sample_counter */ -static tree gcov_sample_counter_decl = NULL_TREE; +static GTY(()) tree gcov_sample_counter_decl = NULL_TREE; /* extern gcov_unsigned_t __gcov_sampling_rate */ -static tree gcov_sampling_rate_decl = NULL_TREE; +static GTY(()) tree gcov_sampling_rate_decl = NULL_TREE; /* forward declaration. */ void gimple_init_instrumentation_sampling (void); @@ -281,9 +281,13 @@ add_sampling_to_edge_counters (void) break; } } +} +static void +cleanup_instrumentation_sampling (void) +{ /* Free the bitmap. */ - if (instrumentation_to_be_sampled) + if (flag_profile_generate_sampling && instrumentation_to_be_sampled) { pointer_set_destroy (instrumentation_to_be_sampled); instrumentation_to_be_sampled = NULL; @@ -1452,6 +1456,7 @@ tree_profiling (void) } del_node_map(); + cleanup_instrumentation_sampling(); return 0; }
[google] Use delete with size parameter in STL deallocate (issue5794070)
This patch makes -fsized-delete define a macro __GXX_DELETE_WITH_SIZE__. If this macro is defined, the deallocate function in new_allocator.h invokes the two parameter delete operator with the size of the object. Tested by bootstrapping the compiler and ensuring no tests are broken. Also verified that compiling a C++ program that uses std::vector with -fsized-delete invokes the two parameter version of operator delete. OK for google/main and google/4_6 branches? c-family/ChangeLog.google-main: 2012-03-12 Easwaran Raman * c-cppbuiltin.c (c_cpp_builtins): Define __GXX_DELETE_WITH_SIZE__ for C++ files compiled with -fsized-delete. libstdc++-v3/ChangeLog.google-main: 2012-03-12 Easwaran Raman * include/ext/new_allocator.h (deallocate): Call delete with size if __GXX_DELETE_WITH_SIZE__ is defined. diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index ccf57fd..3c9a96b 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -970,6 +970,8 @@ c_cpp_builtins (cpp_reader *pfile) format. */ if (ENABLE_DECIMAL_FLOAT && ENABLE_DECIMAL_BID_FORMAT) cpp_define (pfile, "__DECIMAL_BID_FORMAT__"); + if (c_dialect_cxx () && flag_sized_delete) +cpp_define (pfile, "__GXX_DELETE_WITH_SIZE__"); } /* Pass an object-like macro. If it doesn't lie in the user's diff --git a/libstdc++-v3/include/ext/new_allocator.h b/libstdc++-v3/include/ext/new_allocator.h index 0c82bd0..c972c1e 100644 --- a/libstdc++-v3/include/ext/new_allocator.h +++ b/libstdc++-v3/include/ext/new_allocator.h @@ -94,10 +94,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp))); } +#ifdef __GXX_DELETE_WITH_SIZE__ + // __p is not permitted to be a null pointer. + void + deallocate(pointer __p, size_type __t) + { ::operator delete(__p, __t * sizeof(_Tp)); } +#else // __p is not permitted to be a null pointer. void deallocate(pointer __p, size_type) { ::operator delete(__p); } +#endif size_type max_size() const _GLIBCXX_USE_NOEXCEPT -- This patch is available for review at http://codereview.appspot.com/5794070
Propagate profile counts after switch case expansion (issue5896043)
This patch propagates execution count of thee case labels of a switch-case statement after its expansion. Bootstraps and all tests pass. OK for trunk? 2012-03-23 Easwaran Raman * cfgbuild.c (non_zero_profile_counts): New function. (compute_outgoing_frequencies): If at least one successor of a BB has non-zero profile count, retain the counts. * expr.c (do_tablejump): Add a REG_BR_PROB note on the jump to default label. (try_tablejump): Add a parameter to specify the probability of jumping to the default label. * expr.h (try_tablejump): Add a new parameter. * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT. (add_case_node): Pass execution count of the case node and use it to initialize COUNT field. (case_probability): New macro. (expand_case): Propagate execution counts to generated branches using REG_BR_PROB notes. (emit_case_nodes): Likewise. (do_jump_if_equal): Pass probability for REG_BR_PROB note. (compute_subtree_counts): New function to compute SUBTREE_COUNT fields of case nodes. (add_prob_note_to_last_insn): Add a REG_BR_PROB note with the given probability to the last generated instruction. gcc/testsuite/ChangeLog: 2012-03-23 Easwaran Raman * gcc.dg/tree-prof/switch-case-1.c: New test case. * gcc.dg/tree-prof/switch-case-2.c: New test case. diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c index 692fea8..d75fbda 100644 --- a/gcc/cfgbuild.c +++ b/gcc/cfgbuild.c @@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb) purge_dead_tablejump_edges (bb, table); } +/* Check if there is at least one edge in EDGES with a non-zero count + field. */ + +static bool +non_zero_profile_counts ( VEC(edge,gc) *edges) { + edge e; + edge_iterator ei; + FOR_EACH_EDGE(e, ei, edges) +{ + if (e->count > 0) +return true; +} + return false; +} + /* Assume that frequency of basic block B is known. Compute frequencies and probabilities of outgoing edges. */ @@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b) e->count = b->count; return; } + else if (non_zero_profile_counts (b->succs)){ +/*Profile counts already set, but REG_NOTE missing. Retain the counts. */ +return; + } guess_outgoing_edge_probabilities (b); if (b->count) FOR_EACH_EDGE (e, ei, b->succs) diff --git a/gcc/expr.c b/gcc/expr.c index f9de908..fb8eef9 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode); #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree minval, tree range, TABLE_LABEL is a CODE_LABEL rtx for the table itself. DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the - index value is out of range. */ + index value is out of range. + DEFAULT_PROBABILITY is the probability of jumping to the + DEFAULT_LABEL. */ static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) { rtx temp, vector; @@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, the maximum value of the range. */ if (default_label) -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, -default_label); +{ + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, + default_label); + if (default_probability != -1) +{ + rtx jump_insn = get_last_insn(); + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability)); +} +} + /* If index is in range, it must fit in Pmode. Convert to Pmode so we can index with it. */ @@ -10758,7 +10768,7 @@ do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, int try_tablejump (tree index_type, tree index_expr, tree minval, tree range, - rtx table_label, rtx default_label) + rtx table_label, rtx default_label, int default_probability) { rtx index; @@ -10776,7 +10786,7 @@ try_tablejump (tree index_type, tree index_expr, tree minval, tree range, TYPE_MODE (TREE_TYPE (range)), expand_normal (range), TYPE_UNSIGNED (TREE_TYPE (range))), - table_label, default_label); + table_label, default_label, default_prob
Re: Propagate profile counts after switch case expansion (issue5896043)
Some more background on this patch: Right now, while the execution counts of different case labels of a switch statement are obtained during profile collection, they are not propagated to RTL. Instead, counts are regenerated at the RTL level using static heuristics that tend to weigh branches equally which can cause poor optimization of hot code. This patch ensures that the counts collected during profile collection are correctly propagated allowing hot code to be better optimized by RTL optimizations. Patch tested on x86_64. - Easwaran On Fri, Mar 23, 2012 at 10:43 AM, Easwaran Raman wrote: > This patch propagates execution count of thee case labels of a > switch-case statement after its expansion. Bootstraps and all > tests pass. OK for trunk? > > 2012-03-23 Easwaran Raman > > * cfgbuild.c (non_zero_profile_counts): New function. > (compute_outgoing_frequencies): If at least one successor of a > BB has non-zero profile count, retain the counts. > * expr.c (do_tablejump): Add a REG_BR_PROB note on the > jump to default label. > (try_tablejump): Add a parameter to specify the probability > of jumping to the default label. > * expr.h (try_tablejump): Add a new parameter. > * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT. > (add_case_node): Pass execution count of the case node and use > it to initialize COUNT field. > (case_probability): New macro. > (expand_case): Propagate execution counts to generated > branches using REG_BR_PROB notes. > (emit_case_nodes): Likewise. > (do_jump_if_equal): Pass probability for REG_BR_PROB note. > (compute_subtree_counts): New function to compute > SUBTREE_COUNT fields of case nodes. > (add_prob_note_to_last_insn): Add a REG_BR_PROB note with the > given probability to the last generated instruction. > > gcc/testsuite/ChangeLog: > 2012-03-23 Easwaran Raman > * gcc.dg/tree-prof/switch-case-1.c: New test case. > * gcc.dg/tree-prof/switch-case-2.c: New test case. > > diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c > index 692fea8..d75fbda 100644 > --- a/gcc/cfgbuild.c > +++ b/gcc/cfgbuild.c > @@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb) > purge_dead_tablejump_edges (bb, table); > } > > +/* Check if there is at least one edge in EDGES with a non-zero count > + field. */ > + > +static bool > +non_zero_profile_counts ( VEC(edge,gc) *edges) { > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE(e, ei, edges) > + { > + if (e->count > 0) > + return true; > + } > + return false; > +} > + > /* Assume that frequency of basic block B is known. Compute frequencies > and probabilities of outgoing edges. */ > > @@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b) > e->count = b->count; > return; > } > + else if (non_zero_profile_counts (b->succs)){ > + /*Profile counts already set, but REG_NOTE missing. Retain the counts. > */ > + return; > + } > guess_outgoing_edge_probabilities (b); > if (b->count) > FOR_EACH_EDGE (e, ei, b->succs) > diff --git a/gcc/expr.c b/gcc/expr.c > index f9de908..fb8eef9 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode); > #ifdef PUSH_ROUNDING > static void emit_single_push_insn (enum machine_mode, rtx, tree); > #endif > -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); > +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); > static rtx const_vector_from_tree (tree); > static void write_complex_part (rtx, rtx, bool); > > @@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree > minval, tree range, > TABLE_LABEL is a CODE_LABEL rtx for the table itself. > > DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the > - index value is out of range. */ > + index value is out of range. > + DEFAULT_PROBABILITY is the probability of jumping to the > + DEFAULT_LABEL. */ > > static void > do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, > - rtx default_label) > + rtx default_label, int default_probability) > { > rtx temp, vector; > > @@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx > range, rtx table_label, > the maximum value of the range. */ > > if (default_label) > - emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, > - default_label); > + { > + emit_cmp_and_jump_insns (i
Re: Propagate profile counts after switch case expansion (issue5896043)
On Fri, Mar 23, 2012 at 3:29 PM, Andi Kleen wrote: > Easwaran Raman writes: > >> Some more background on this patch: Right now, while the execution >> counts of different case labels of a switch statement are obtained >> during profile collection, they are not propagated to RTL. Instead, >> counts are regenerated at the RTL level using static heuristics that >> tend to weigh branches equally which can cause poor optimization of >> hot code. This patch ensures that the counts collected during profile >> collection are correctly propagated allowing hot code to be better >> optimized by RTL optimizations. Patch tested on x86_64. > > I think your patch doesn't use the probably to weight the decision > tree for non tablejump, right? I looked at this some time ago, > but the patch always had problems. Do you mean use the weights to decide the shape of the binary tree (similar to COST_TABLE heuristic)? I am planning to send a separate patch for that. This one just makes sure that the profile counts are propagated correctly. So you will still have a situation where a branch corresponding to an infrequently executed case dominates a frequently executed case, but the BB of the cases gets the right profile weight. - Easwaran > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only
Re: Propagate profile counts after switch case expansion (issue5896043)
On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka wrote: >> This patch propagates execution count of thee case labels of a >> switch-case statement after its expansion. Bootstraps and all >> tests pass. OK for trunk? > > Hi, > while this is resonable thing to do, I belive it would make most sense > to make switch lowering at gimple level: i.e. reorganize the existing > code to produce gimple statement and change expansion code to produce > tablejump from every gimple switch statement that was left in the > code. > > This would be both cleaner and would allow gimple optimizers to improve the > generated code. Incrementally it would also allow us to improve switch > exansion > that is quite baroque and not realy producing very good code in some common > cases. > > If you would be interested working on this (i.e. reorganizing the expansion > code to produce gimple), I would be very happy. If not, I can review the > profile updating part for mainline, since in any case this is desriable thing > to do. I am planning to explore improvements to switch expansion (peeling some cases and using jump tables for the rest, for example) and I think the reorganization you suggest is the right way to do such improvements. But until I can spend time on it and get it done, I would like this patch to get reviewed and checked in. Thanks, Easwaran > Honza
Re: [google]Add support for sampled profile collection (issue4438083)
I want to revive this patch for mainline and have some questions on Honza's comments. On Fri, Apr 29, 2011 at 1:48 PM, Jan Hubicka wrote: >> >> A known limitation is that value profiling is not yet sampled, but it >> does not seem to cause problems. > > For the performance alone, we probably don't need to care that much > given the fact that we value porfile only relatively expensive operations. > But if we want to have the turn off/on feature, then i gueess we need to > guard everything. It is not much of pain to add the code generating > conditionals everywhere, after all. If we sample value profiling instrumentation as well, does it make sense to use a single counter and rate for all instrumentations. If not, does the additional complexity (and flags) justify the benefit of uniformity? >> > +/* Insert STMT_IF around given sequence of consecutive statements in the >> > + same basic block starting with STMT_START, ending with STMT_END. */ >> > + >> > +static void >> > +insert_if_then (gimple stmt_start, gimple stmt_end, gimple stmt_if) >> > +{ >> > + gimple_stmt_iterator gsi; >> > + basic_block bb_original, bb_before_if, bb_after_if; >> > + edge e_if_taken, e_then_join; >> > + >> > + gsi = gsi_for_stmt (stmt_start); >> > + gsi_insert_before (&gsi, stmt_if, GSI_SAME_STMT); >> > + bb_original = gsi_bb (gsi); >> > + e_if_taken = split_block (bb_original, stmt_if); >> > + e_if_taken->flags &= ~EDGE_FALLTHRU; >> > + e_if_taken->flags |= EDGE_TRUE_VALUE; >> > + e_then_join = split_block (e_if_taken->dest, stmt_end); >> > + bb_before_if = e_if_taken->src; >> > + bb_after_if = e_then_join->dest; > > On mainline when we do profile estimation before profiling instrumentaiton, > now, > you really want to update profile for performance here. I am not sure I understand this. >> > + make_edge (bb_before_if, bb_after_if, EDGE_FALSE_VALUE); >> > +} >> > + >> > +/* Transform: >> > + >> > + ORIGINAL CODE >> > + >> > + Into: >> > + >> > + __gcov_sample_counter++; >> > + if (__gcov_sample_counter >= __gcov_sampling_rate) >> > + { >> > + __gcov_sample_counter = 0; >> > + ORIGINAL CODE >> > + } > > Hmm, I think the probability that internal loop of program will interfere with > sampling rate is relatively high, but I see it is bit hard to do something > about > it. Can we think of some very basic randomization of sampling_rate? The predominant use case we have for this technique for server workloads is as follows: Have a high value for __gcov_sampling_rate (that should probably be named __gcov_sampling_interval) during server start up so that it reduces the overhead as well as ensures that the startup period doesn't pollute the rest of the counters. After startup, change the sampling interval to a small value (in many cases, to 1) using the external interface in libgcov.c. In this use case, randomization doesn't make sense during startup (since we want to skip profile collection) as well as the steady phase (since the interval is 1 or a small number). If you want to have randomization support, do you have any suggestions for how to make it work with low sampling intervals? Thanks, Easwaran > Honza
Re: Propagate profile counts after switch case expansion (issue5896043)
On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman wrote: > On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka wrote: >>> This patch propagates execution count of thee case labels of a >>> switch-case statement after its expansion. Bootstraps and all >>> tests pass. OK for trunk? >> >> Hi, >> while this is resonable thing to do, I belive it would make most sense >> to make switch lowering at gimple level: i.e. reorganize the existing >> code to produce gimple statement and change expansion code to produce >> tablejump from every gimple switch statement that was left in the >> code. >> >> This would be both cleaner and would allow gimple optimizers to improve the >> generated code. Incrementally it would also allow us to improve switch >> exansion >> that is quite baroque and not realy producing very good code in some common >> cases. >> >> If you would be interested working on this (i.e. reorganizing the expansion >> code to produce gimple), I would be very happy. If not, I can review the >> profile updating part for mainline, since in any case this is desriable thing >> to do. > > I am planning to explore improvements to switch expansion (peeling > some cases and using jump tables for the rest, for example) and I > think the reorganization you suggest is the right way to do such > improvements. But until I can spend time on it and get it done, I > would like this patch to get reviewed and checked in. > > Thanks, > Easwaran Ping.
Re: Propagate profile counts after switch case expansion (issue5896043)
Ping. On Mon, Apr 9, 2012 at 2:33 PM, Easwaran Raman wrote: > On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman wrote: >> On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka wrote: >>>> This patch propagates execution count of thee case labels of a >>>> switch-case statement after its expansion. Bootstraps and all >>>> tests pass. OK for trunk? >>> >>> Hi, >>> while this is resonable thing to do, I belive it would make most sense >>> to make switch lowering at gimple level: i.e. reorganize the existing >>> code to produce gimple statement and change expansion code to produce >>> tablejump from every gimple switch statement that was left in the >>> code. >>> >>> This would be both cleaner and would allow gimple optimizers to improve the >>> generated code. Incrementally it would also allow us to improve switch >>> exansion >>> that is quite baroque and not realy producing very good code in some common >>> cases. >>> >>> If you would be interested working on this (i.e. reorganizing the expansion >>> code to produce gimple), I would be very happy. If not, I can review the >>> profile updating part for mainline, since in any case this is desriable >>> thing >>> to do. >> >> I am planning to explore improvements to switch expansion (peeling >> some cases and using jump tables for the rest, for example) and I >> think the reorganization you suggest is the right way to do such >> improvements. But until I can spend time on it and get it done, I >> would like this patch to get reviewed and checked in. >> >> Thanks, >> Easwaran > > Ping.
[google] Add support for delete operator that takes the size of the object as a parameter
In the tcmalloc memory allocator(http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html), deallocation involves a costly lookup to get the size of the deleted object. The size is required to find the right free list to release back the object. By passing the size of the object to the delete call, this lookup can be avoided. This patch adds support for operator delete(void*, size_t) guarded by a -fsized-delete flag. It also adds a default implementation of delete(void *, size_t) that ignores the size parameter. Bootstraps and no test regressions. OK for google/gcc-4_6 branch? --- 2011-12-11 Easwaran Raman * common.opt (fsized-delete): New option. cp/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman * decl.c (cxx_init_decl_processing): Specify a function that takes a void* and size_t for DELETE_EXPR. * call.c (build_op_delete_call): If fsized-delete is used, use the variant that takes size_t as the second parameter except when deleteting a pointer of type void *. testsuite/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman * g++.dg/other/sized-delete-1.C: New test. libstdc++-v3/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman * libsupc++/del_op.cc (delete): Define a new variant void operator delete(void*, std::size_t). * libsupc++/new (delete): Declare void operator delete(void*, std::size_t) throw(); * testsuite/util/testsuite_abi.cc (check_version): Add new known version GLIBCXX_3.4.17. * config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Add new symbol _ZdlPvm@@GLIBCXX_3.4.17. * config/abi/pre/gnu.ver: Add new symbol _ZdlPvm at version GLIBCXX_3.4.17. Index: libstdc++-v3/libsupc++/del_op.cc === --- libstdc++-v3/libsupc++/del_op.cc (revision 182213) +++ libstdc++-v3/libsupc++/del_op.cc (working copy) @@ -46,3 +46,11 @@ operator delete(void* ptr) throw () if (ptr) std::free(ptr); } + +_GLIBCXX_WEAK_DEFINITION void +operator delete(void* ptr, +std::size_t bytes __attribute__((__unused__))) throw () +{ + if (ptr) +std::free(ptr); +} Index: libstdc++-v3/libsupc++/new === --- libstdc++-v3/libsupc++/new (revision 182213) +++ libstdc++-v3/libsupc++/new (working copy) @@ -93,6 +93,7 @@ namespace std void* operator new(std::size_t) throw (std::bad_alloc); void* operator new[](std::size_t) throw (std::bad_alloc); void operator delete(void*) throw(); +void operator delete(void*, std::size_t) throw(); void operator delete[](void*) throw(); void* operator new(std::size_t, const std::nothrow_t&) throw(); void* operator new[](std::size_t, const std::nothrow_t&) throw(); Index: libstdc++-v3/testsuite/util/testsuite_abi.cc === --- libstdc++-v3/testsuite/util/testsuite_abi.cc (revision 182213) +++ libstdc++-v3/testsuite/util/testsuite_abi.cc (working copy) @@ -194,6 +194,7 @@ check_version(symbol& test, bool added) known_versions.push_back("GLIBCXX_3.4.14"); known_versions.push_back("GLIBCXX_3.4.15"); known_versions.push_back("GLIBCXX_3.4.16"); + known_versions.push_back("GLIBCXX_3.4.17"); known_versions.push_back("GLIBCXX_LDBL_3.4"); known_versions.push_back("GLIBCXX_LDBL_3.4.7"); known_versions.push_back("GLIBCXX_LDBL_3.4.10"); @@ -560,4 +561,3 @@ demangle(const std::string& mangled) } return name; } - Index: libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt === --- libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt (revision 182213) +++ libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt (working copy) @@ -2422,6 +2422,7 @@ FUNC:_ZTv0_n24_NSt9strstreamD1Ev@@GLIBCXX_3.4 FUNC:_ZdaPv@@GLIBCXX_3.4 FUNC:_ZdaPvRKSt9nothrow_t@@GLIBCXX_3.4 FUNC:_ZdlPv@@GLIBCXX_3.4 +FUNC:_ZdlPvm@@GLIBCXX_3.4.17 FUNC:_ZdlPvRKSt9nothrow_t@@GLIBCXX_3.4 FUNC:_Znam@@GLIBCXX_3.4 FUNC:_ZnamRKSt9nothrow_t@@GLIBCXX_3.4 Index: libstdc++-v3/config/abi/pre/gnu.ver === --- libstdc++-v3/config/abi/pre/gnu.ver (revision 182213) +++ libstdc++-v3/config/abi/pre/gnu.ver (working copy) @@ -1279,6 +1279,13 @@ GLIBCXX_3.4.16 { } GLIBCXX_3.4.15; +GLIBCXX_3.4.17 { + +# operator delete(void*, , unsigned long) +_ZdlPvm; + +} GLIBCXX_3.4.16; + # Symbols in the support library (libsupc++) have their own tag. CXXABI_1.3 { Index: gcc/testsuite/g++.dg/other/sized-delete-1.C === --- gcc/testsuite/g++.dg/other/sized-delete-1.C (revision 0) +++ gcc/testsuite/
Re: [google] Add support for delete operator that takes the size of the object as a parameter
Thanks for the comments Paolo. I have attached the new patch. I have bumped the version to 3.4.18 and used _ZdlPv[jmy] in gnu.ver. I have also added the symbol to baseline_symbols.txt of other targets. -Easwaran 2011-12-11 Easwaran Raman * common.opt (fsized-delete): New option. cp/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman * decl.c (cxx_init_decl_processing): Specify a function that takes a void* and size_t for DELETE_EXPR. * call.c (build_op_delete_call): If fsized-delete is used, use the variant that takes size_t as the second parameter except when deleteting a pointer of type void *. testsuite/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman * g++.dg/other/sized-delete-1.C: New test. libstdc++-v3/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman * libsupc++/del_op.cc (delete): Define a new variant void operator delete(void*, std::size_t). * libsupc++/new (delete): Declare void operator delete(void*, std::size_t) throw(); * testsuite/util/testsuite_abi.cc (check_version): Add new known version GLIBCXX_3.4.18. * config/abi/post/i386-linux-gnu/baseline_symbols.txt: Add new symbol _ZdlPvm@@GLIBCXX_3.4.18. * config/abi/post/sparc-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/mips64-linux-gnu/64/baseline_symbols.txt: Likewise. * config/abi/post/mips64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/alpha-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/i486-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/s390-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt: Likewise. * config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/mips-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt: Likewise. * config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt: Likewise. * config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.10/amd64/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.10/sparcv9/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.10/baseline_symbols.txt: Likewise. * config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/hppa-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.8/sparcv9/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.8/baseline_symbols.txt: Likewise. * config/abi/pre/gnu.ver: Add new symbol _ZdlPv[jmy] at version GLIBCXX_3.4.18. On Mon, Dec 12, 2011 at 5:25 AM, Paolo Carlini wrote: > On 12/12/2011 02:21 PM, Diego Novillo wrote: >> >> Ah, right. I missed the ABI implications. > > For possible inclusion in mainline too, things don't seem completely ok: > nothing should be added to the baseline and very likely the export should be > adjusted to accommodate for different size_t on the various targets, by > using [] in the pattern. See, eg, the existing operator new[](size_t). > > Paolo. Index: libstdc++-v3/libsupc++/del_op.cc === --- libstdc++-v3/libsupc++/del_op.cc (revision 182251) +++ libstdc++-v3/libsupc++/del_op.cc (working copy) @@ -46,3 +46,11 @@ operator delete(void* ptr) throw () if (ptr) std::free(ptr); } + +_GLIBCXX_WEAK_DEFINITION void +operator delete(void* ptr, +std::size_t bytes __attribute__((__unused__))) throw () +{ + if (ptr) +std::free(ptr); +} Index: libstdc++-v3/libsupc++/new === --- libstdc++-v3/libsupc++/new (revision 182251) +++ libstdc++-v3/libsupc++/new (working copy) @@ -93,6 +93,7 @@ namespace std void* operator new(std::size_t) throw (std::bad_alloc); void* operator new[](std::size_t) throw (std::bad_alloc); void operator delete(void*) throw(); +void operator delete(void*, std::size_t) throw(); void operator delete[](void*) throw(); void* operator new(std::size_t, const std::nothrow_t&) throw(); void* operator new[](std::size_t, const std::nothrow_t&) throw(); Index: libstdc++-v3/testsuite/util/testsuite_abi.cc === --- libstdc++-v3/testsuite/util/testsuite_abi.cc (revision 182251) +++ libstdc++-v3/testsuite/util/testsuite_abi.cc (working copy) @@ -195,6 +195,7 @@ check_version(symbol& test, bool added) known_versions.push_back("GLIBCXX_3.4.15"); known_versions.push_back(&qu
Re: [google] Add support for delete operator that takes the size of the object as a parameter
On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini wrote: > On 12/12/2011 09:37 PM, Easwaran Raman wrote: >> >> Thanks for the comments Paolo. I have attached the new patch. I have >> bumped the version to 3.4.18 > > You shouldn't: 4.7 is not out yet, thus no reason to increase the minor > version beyond the current 17. Ok, I then don't understand your comment "Note that backporting the patch as-is to 4_6-branch would be very wrong in terms of ABI (in mainline we already have a 3.4.17)". My original patch added the new symbol in version 3.4.17. Since we don't want to add the symbol to 3.4.16 (if we have a situation where the new runtime is not available when running a program compiled with -fsized-delete) and you said I shouldn't be using 3.4.17, I assumed I had to bump up the version. > >> and used _ZdlPv[jmy] in gnu.ver. I have >> also added the symbol to baseline_symbols.txt of other targets. > > You should not, just read again what I wrote. And you don't have to believe > me: just browse the libstdc++ ChangeLogs and see if somebody ever does that > when the linker map is touched. Sorry, I again misunderstood this as well (and still don't have a good picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? I added that by mimicking the symbol _Znw[jmy] found in the same file. From the log, it looks like the baseline_symbols.txt seems to be generated, but I am not sure how that is to be done. For example, r145437 says a bunch of these baseline_symbols.txt are regenerated, but I don't see any other change from which this might be generated. Thanks, Easwaran > Paolo.
Re: [google] Add support for delete operator that takes the size of the object as a parameter
On Mon, Dec 12, 2011 at 1:52 PM, Easwaran Raman wrote: > On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini > wrote: >> On 12/12/2011 09:37 PM, Easwaran Raman wrote: >>> >>> Thanks for the comments Paolo. I have attached the new patch. I have >>> bumped the version to 3.4.18 >> >> You shouldn't: 4.7 is not out yet, thus no reason to increase the minor >> version beyond the current 17. > Ok, I then don't understand your comment > "Note that backporting the patch as-is to 4_6-branch would be very > wrong in terms of ABI (in mainline we already have a 3.4.17)". > My original patch added the new symbol in version 3.4.17. Since we > don't want to add the symbol to 3.4.16 (if we have a situation where > the new runtime is not available when running a program compiled with > -fsized-delete) and you said I shouldn't be using 3.4.17, I assumed > I had to bump up the version. > >> >>> and used _ZdlPv[jmy] in gnu.ver. I have >>> also added the symbol to baseline_symbols.txt of other targets. >> >> You should not, just read again what I wrote. And you don't have to believe >> me: just browse the libstdc++ ChangeLogs and see if somebody ever does that >> when the linker map is touched. > > Sorry, I again misunderstood this as well (and still don't have a good > picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? I added > that by mimicking the symbol _Znw[jmy] found in the same file. From > the log, it looks like the baseline_symbols.txt seems to be generated, > but I am not sure how that is to be done. For example, r145437 says a > bunch of these baseline_symbols.txt are regenerated, but I don't see > any other change from which this might be generated. > > Thanks, > Easwaran It looks like running 'make new-abi-baseline' under TARGET/libstdc++-v3/testsuite generates the baseline file. Should I check in that? What about other targets? Thanks, Easwaran > >> Paolo.
Re: [google] Add support for delete operator that takes the size of the object as a parameter
Based on Paolo's comments I am dropping the changes to baseline_symbols.txt. As far as minor version, I am bumping it up to 18. Assuming that this patch will be able to go in gcc 4.8 (with minor version 18), I want to keep it at the same version in google/main and google/gcc-4_6. Bootstraps and no test regression. OK for google/main and google/gcc-4_6 branches? - 011-12-17 Easwaran Raman * common.opt (fsized-delete): New option. cp/ChangeLog.google-main: 2011-12-17 Easwaran Raman * decl.c (cxx_init_decl_processing): Specify a function that takes a void* and size_t for DELETE_EXPR. * call.c (build_op_delete_call): If fsized-delete is used, use the variant that takes size_t as the second parameter except when deleteting a pointer of type void *. testsuite/ChangeLog.google-main: 2011-12-17 Easwaran Raman * g++.dg/other/sized-delete-1.C: New test. libstdc++-v3/ChangeLog.google-main: 2011-12-17 Easwaran Raman * libsupc++/del_op.cc (delete): Define a new variant void operator delete(void*, std::size_t). * libsupc++/new (delete): Declare void operator delete(void*, std::size_t) throw(); * testsuite/util/testsuite_abi.cc (check_version): Add new known version GLIBCXX_3.4.18. * config/abi/pre/gnu.ver: Add new symbol _ZdlPv[jmy] at version GLIBCXX_3.4.18. On Wed, Dec 14, 2011 at 3:31 AM, Paolo Carlini wrote: > Hi, > >> On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini >> wrote: >>> >>> On 12/12/2011 09:37 PM, Easwaran Raman wrote: >>>> >>>> Thanks for the comments Paolo. I have attached the new patch. I have >>>> bumped the version to 3.4.18 >>> >>> You shouldn't: 4.7 is not out yet, thus no reason to increase the minor >>> version beyond the current 17. >> >> Ok, I then don't understand your comment >> "Note that backporting the patch as-is to 4_6-branch would be very >> wrong in terms of ABI (in mainline we already have a 3.4.17)". >> My original patch added the new symbol in version 3.4.17. Since we >> don't want to add the symbol to 3.4.16 (if we have a situation where >> the new runtime is not available when running a program compiled with >> -fsized-delete) and you said I shouldn't be using 3.4.17, I assumed >> I had to bump up the version. > > Note this is going to be an academic discussion, because it's too late for > 4.7 anyway, and I'm not even sure it's conforming to add overloads like this > for operators. > > Anyway. > > For 4.6 branch at this point the situation would be very difficult. We could > have a small time window before 4.7 is out to move **all** its new symbols > vs 4.6, currently in 3.4.17, to a new 3.4.18 and then we could bump 4.6 to > 3.4.17. You see, after a minor is delivered you cannot add anything to it, > and also you have the general requirement that the minor version is the > minor ersion, thus irrespective whether we are talking about gcc4.6 or > gcc4.7, a specific minor version number defines which symbols are exported. > I hope now the issues are more clear.You must be always very careful. > >>>> and used _ZdlPv[jmy] in gnu.ver. I have >>>> also added the symbol to baseline_symbols.txt of other targets. >>> >>> You should not, just read again what I wrote. And you don't have to >>> believe >>> me: just browse the libstdc++ ChangeLogs and see if somebody ever does >>> that >>> when the linker map is touched. >> >> Sorry, I again misunderstood this as well (and still don't have a good >> picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? > > Looks fine yes. But I didn't really check the letters. General rule of thumb > when fiddling with linker scripts: double check what you are doing with a > multilib system, like x86_64, at least you can catch errors when you are > mistakenly not accounting for the 32-bit version of the symbols. In the case > at issue, size_t can boil down be unsigned int, unsigned long, unsigned long > long, make sure the pattern includes all three. > >> I added >> that by mimicking the symbol _Znw[jmy] found in the same file. From >> the log, it looks like the baseline_symbols.txt seems to be generated, >> but I am not sure how that is to be done. For example, r145437 says a >> bunch of these baseline_symbols.txt are regenerated, but I don't see >> any other change from which this might be generated. > > General rule of thumb: normally, if you aren't a release manager, or a > maintainer, **never** regenerate the baseline_symbols files. Just
[google] Propagate profile information to RTL level during switch expansion
This patch propagates profile information to RTL level when expanding switch statements using jump table or a binary tree of branches. Ok for google/gcc-4_6 branch? I would like the patch to be considered for trunk when stage 1 opens again. -Easwaran 2012-01-31 Easwaran Raman * expr.c (do_tablejump): Add default_probability as the sixth parameter and use it to generate REG_BR_PROB note. (try_tablejump): Add default_probability as a parameter. * cfgbuild.c (non_zero_profile_counts): New function. (compute_outgoing_frequencies): If edges have profile counts set, don't replace them with guessed values. * expr.h (try_tablejump): Add additional parameter to the declaration. * stmt.c (tree-flow.h): Include. (case_node): Add new fields count and subtree_count. (add_case_node): Pass count as a paramater and initialize count field of case_node. (case_probability): New macro. (expand_case): Propagate profile information from the tree level to RTL during switch case expansion. (balance_case_nodes): Compute subtree_count for case nodes. (emit_case_nodes): Set branch probabilities for generated branches. testsuite/ChengeLog.google-4_6: 2012-01-31 Easwaran Raman * tree-prof/switch-case-1.c: New test. * tree-prof/switch-case-2.c: New test. Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c === --- gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-options "-O2 -fdump-rtl-expand-blocks -fdump-tree-optimized-blocks" } */ +int g; + +__attribute__((noinline)) void foo (int n) +{ + switch (n) +{ +case 1: + g++; break; +case 2: + g += 2; break; +case 3: + g += 1; break; +case 4: + g += 3; break; +case 5: + g += 4; break; +case 6: + g += 5; break; +case 7: + g += 6; break; +case 8: + g += 7; break; +case 9: + g += 8; break; +default: + g += 8; break; + } +} + +int main () +{ + int i; + for (i = 0; i < 1; i++) + foo ((i * i) % 5); + return 0; +} +/* { dg-final-use { scan-rtl-dump-times "Succ edge\[^\\n\]*count:4000" 4 "expand"} } */ +/* { dg-final-use { scan-rtl-dump-times "Succ edge\[^\\n\]*count:2000" 1 "expand"} } */ +/* { dg-final-use { cleanup-rtl-dump "expand" } } */ Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c === --- gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-options "-O2 -fdump-rtl-expand-blocks" } */ +int g; + +__attribute__((noinline)) void foo (int n) +{ + switch (n) +{ +case 99: + g += 2; break; +case 1: + g++; break; +case 100: + g += 1; break; +case 4: + g += 3; break; +case 5: + g += 4; break; +case 6: + g += 5; break; +case 7: + g += 6; break; +case 8: + g += 7; break; +case 9: + g += 8; break; +default: + g += 8; break; + } +} + +int main () +{ + int i; + for (i = 0; i < 1; i++) + foo ((i * i) % 5); + return 0; +} +/* { dg-final-use { scan-rtl-dump-times "Succ edge\[^\\n\]*count:4000" 4 "expand"} } */ +/* { dg-final-use { scan-rtl-dump-times "Succ edge\[^\\n\]*count:2000" 1 "expand"} } */ +/* { dg-final-use { cleanup-rtl-dump "expand" } } */ Index: gcc/expr.c === --- gcc/expr.c (revision 183262) +++ gcc/expr.c (working copy) @@ -155,7 +155,7 @@ static rtx do_store_flag (sepops, rtx, enum machin #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10245,7 +10245,7 @@ try_casesi (tree index_type, tree index_expr, tree static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) { rtx temp, vector; @@ -10261,9 +10261,17 @@ do_tablejump (rtx index, enum machine_mode mode, r the maximum value of the range. */ if (default_label) -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, - default_label); +{ + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, + default_label); + if (default_probability != -1) +{ + rtx jump_insn = get_last_
Re: [google] Propagate profile information to RTL level during switch expansion
On Tue, Jan 31, 2012 at 10:16 PM, Xinliang David Li wrote: > Ok for google branch with minor changes below. > > thanks, > > David > >>> +#define case_probability(x, y) ((y) ? ((x) * REG_BR_PROB_BASE / (y)) : >>> -1) >>> + > > Using upper case for macro? From http://gcc.gnu.org/codingconventions.html, "Macros names should be in ALL_CAPS when it's important to be aware that it's a macro (e.g. accessors and simple predicates), but in lowercase (e.g., size_int) where the macro is a wrapper for efficiency that should be considered as a function" In this case case_probability is a wrapper for efficiency and hence I decided to go with the lower case name. >>> + count -= default_count; >>> + default_count = 0; > > Why resetting it to 0? If there are no gaps, then the indirect jump can't jump to the default label. Later I set default_edge->count = default_count; and this value should be 0 if there are no gaps. I have addressed the rest of your comments. -Easwaran > > >>> + >>> +static void >>> +add_prob_note_to_last_insn(int probability) >>> +{ > > Missing document.
Re: Ping: Re: Improve DSE in the presence of calls
Ping. On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman wrote: > http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html >
Re: Ping: Re: Improve DSE in the presence of calls
Ping. Diego, David, Is this patch OK for google/main? -Easwaran On Thu, Jun 2, 2011 at 4:48 PM, Easwaran Raman wrote: > Ping. > > On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman wrote: >> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html >> >
Re: Improve DSE in the presence of calls
On Tue, Jun 14, 2011 at 5:47 PM, H.J. Lu wrote: > On Tue, May 10, 2011 at 12:18 PM, Easwaran Raman wrote: >> On Tue, May 3, 2011 at 9:40 AM, Easwaran Raman wrote: >>> On Mon, May 2, 2011 at 8:37 PM, Jeff Law wrote: >>>> -BEGIN PGP SIGNED MESSAGE- >>>> Hash: SHA1 >>>> >>>> On 04/26/11 16:06, Easwaran Raman wrote: >>>> >>>>> >>>>>> You're right. The patch has correctness issues. It is not possible to >>>>>> simply not call add_wild_read because it also resets >>>>>> active_local_stores and frees read_recs. During the local phase it >>>>>> seems better to just treat calls as wild reads and reset >>>>>> active_local_stores and free read_recs. I've now refactored >>>>>> add_wild_read so that resetting active_local_stores and free read_recs >>>>>> are in separate methods that can be called on non-const/non-memset >>>>>> calls. In addition, I have added a non_frame_wild_read field in >>>>>> insn_info to mark non-const and non-memset calls. >>>>> >>>>>> I've attached the revised patch. >>>> Looking better. Just a few more things. >>>> >>>> Don't all locals escape if the callee has a static chain? Is that >>>> handled anywhere? >>>> >>>> Don't you still have the potential for wild reads in dse_step5_nospill >>>> (say from an asm)? if so, then the change can't be correct. >>>> >>>> Couldn't you just add a clause like this before the else-if? >>>> >>>> else if (insn_info->non_frame_wild_read) >>>> { >>>> if (dump_file) >>>> fprintf (dump_file, "non-frame wild read\n"); >>>> scan_reads_nospill (insn_info, v, NULL); >>>> } >>>> else if (insn_info->read_rec) >>>> { >>>> /* Leave this clause unchanged */ >>>> } >>>> >>>> Am I missing something? >>> >>> I am not sure I understand the problem here. If there is a wild read >>> from asm, the instruction has the wild_read flag set. The if statement >>> checks if that flag is set and if so it clears the bitmap - which was >>> the original behavior. Originally, only if read_rec is non NULL you >>> need to recompute the kill set. Now, even if read_rec is NULL, >>> non_frame_wild_read could be set requiring the kill set to be >>> modified, which is what this patch does. In fact, isn't what you have >>> written above the equivalent to what is in the patch as '/* Leave this >>> clause unchanged */' is the same as >>> >>> if (dump_file) >>> fprintf (dump_file, "regular read\n"); >>> scan_reads_nospill (insn_info, v, NULL); >>> >>> >>> -Easwaran >>> >> >> Ping. I have changed the test case to use int and added another test >> case that shows DSE doesn't happen when the struct instance is >> volatile (wild_read gets set in that case) >> >> > > One test failed on Linux/ia32: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49414 > > -- > H.J. > Sorry about the ia32 test failure. Looking into it now. -Easwaran
Re: Improve DSE in the presence of calls
This patch seems to break ia64 and some other targets. I have updated Bug 49429 with a test case triage. It looks like for some EXPR, may_be_aliased returns incorrect value and hence can_escape incorrectly concludes the variable can't escape resulting in removal of useful stores. (So can_escape() returns false. But later on, in the same BB, I see: In the following list of instructions, (insn 4 3 6 2 (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64]) (reg:DI 112 in0)) y.c:23 5 {movdi_internal} (expr_list:REG_DEAD (reg:DI 112 in0) (nil))) ... (insn 36 30 37 2 (set (reg:DI 120 out0) (reg/f:DI 357)) 5 {movdi_internal} (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp) (const_int 62 [0x3e])) (nil))) (insn 37 36 38 2 (set (reg:DI 121 out1) (reg/f:DI 341)) 5 {movdi_internal} (expr_list:REG_DEAD (reg/f:DI 341) (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp) (const_int 96 [0x60])) (nil (insn 38 37 39 2 (set (reg:DI 122 out2) (const_int 31 [0x1f])) 5 {movdi_internal} (nil)) (call_insn 39 38 42 2 (parallel [ (set (reg:DI 8 r8) (call (mem:DI (symbol_ref:DI ("memcpy") [flags 0x41] ) [0 memcpy S8 A64]) (const_int 1 [0x1]))) (clobber (reg:DI 320 b0)) (clobber (scratch:DI)) (clobber (scratch:DI)) ]) 332 {call_value_gp} (expr_list:REG_DEAD (reg:DI 122 out2) (expr_list:REG_DEAD (reg:DI 121 out1) (expr_list:REG_DEAD (reg:DI 120 out0) (expr_list:REG_UNUSED (reg:DI 8 r8) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil)) (expr_list:REG_DEP_TRUE (use (reg:DI 1 r1)) (expr_list:REG_DEP_TRUE (use (reg:DI 122 out2)) (expr_list:REG_DEP_TRUE (use (reg:DI 121 out1)) (expr_list:REG_DEP_TRUE (use (reg:DI 120 out0)) (nil)) for the memory expression (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64]) (reg:DI 112 in0)) may_be_aliased() returns false, but register 341 is passed as the second parameter to memcpy. I suspect this is a bug elsewhere which is exposed by this patch. If someone knows why this might be happening, I can tighten the can_escape() function appropriately. Thanks, Easwaran On Tue, Jun 14, 2011 at 9:34 AM, Jeff Law wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 05/10/11 13:18, Easwaran Raman wrote: > >>>> I am not sure I understand the problem here. If there is a wild read >>>> from asm, the instruction has the wild_read flag set. The if statement >>>> checks if that flag is set and if so it clears the bitmap - which was >>>> the original behavior. Originally, only if read_rec is non NULL you >>>> need to recompute the kill set. Now, even if read_rec is NULL, >>>> non_frame_wild_read could be set requiring the kill set to be >>>> modified, which is what this patch does. In fact, isn't what you have >>>> written above the equivalent to what is in the patch as '/* Leave this >>>> clause unchanged */' is the same as >>>> >>>> if (dump_file) >>>> fprintf (dump_file, "regular read\n"); >>>> scan_reads_nospill (insn_info, v, NULL); >>>> >>>> >>>> -Easwaran >>>> >> >>> Ping. I have changed the test case to use int and added another test >>> case that shows DSE doesn't happen when the struct instance is >>> volatile (wild_read gets set in that case) >> >> >> What's the purpose behind using unit64_t in the testcase? Somehow I >> suspect using int64_t means the test is unlikely not going to work >> across targets with different word sizes. > Sorry for the exceedingly long wait. Things have been a bit crazy the > last several weeks. > > On a positive note, re-reading things now I think my objection/comment > was mis-guided. > > Patch approved, and again, sorry for the absurdly long period of > non-responsiveness. > > jeff > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJN942IAAoJEBRtltQi2kC7Fj4IAIUvXsEKHZEKHS2k/psJWyaM > Uo/vW3CLydRP0+Np/VVSwzHlmWfdWmOj1WPw1Svhvr4gP8BrZ13okVv5jbw1Hh3Y > R4mShXFK5eYzmGx5wL54hOze5zViN3gomNGbDAAhk6TCzNXmPyLT/6V1tLFTNhD5 > 6zOiW8pXh9ik6qTTCKbG0EMuJXDnIbYrJs4d/gHFerUgmRPc8adKjF3PCngD3F4r > 40n9W/UxUejYUddavDW1fIdALWYc56F3glplFsII7SMnOmih8MTFYOvk6SZsLS5O > G2nzmnUuwt6tPWTyk9bpVKQi5dn8MmLkM13w22t36GKIg6OER2KfUdv44dgE7yw= > =o7AI > -END PGP SIGNATURE- >
Re: Improve DSE in the presence of calls
The gimple for test2_31 before RTL expansion contains: # .MEMD.2034_2 = VDEF <.MEMD.2034_1(D)> s1D.2035 = s1D.1255; The RHS is a PARM_DECL. It doesn't have TREE_ADDRESSABLE and the LHS has, which makes sense. But then the move is translated to memcpy but the src memory location is still considered not addressable. Setting TREE_ADDRESSABLE to RHS should fix this bug. I don't know which is the best place to that though. At the point when it decides to use memcpy to do the copying in emit_block_move_hints, we are dealing with temporaries and we don't have the src TREE lying around. Is passing the tree EXP to emit_block_move from store_expr a reasonable approach? -Easwaran On Fri, Jun 17, 2011 at 2:16 PM, Easwaran Raman wrote: > This patch seems to break ia64 and some other targets. I have updated > Bug 49429 with a test case triage. It looks like for some EXPR, > may_be_aliased returns incorrect value and hence can_escape > incorrectly concludes the variable can't escape resulting in removal > of useful stores. > > > (So can_escape() returns false. But later on, in the same BB, I see: > > In the following list of instructions, > > (insn 4 3 6 2 (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64]) > (reg:DI 112 in0)) y.c:23 5 {movdi_internal} > (expr_list:REG_DEAD (reg:DI 112 in0) > (nil))) > > ... > > (insn 36 30 37 2 (set (reg:DI 120 out0) > (reg/f:DI 357)) 5 {movdi_internal} > (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp) > (const_int 62 [0x3e])) > (nil))) > (insn 37 36 38 2 (set (reg:DI 121 out1) > (reg/f:DI 341)) 5 {movdi_internal} > (expr_list:REG_DEAD (reg/f:DI 341) > (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp) > (const_int 96 [0x60])) > (nil > (insn 38 37 39 2 (set (reg:DI 122 out2) > (const_int 31 [0x1f])) 5 {movdi_internal} > (nil)) > (call_insn 39 38 42 2 (parallel [ > (set (reg:DI 8 r8) > (call (mem:DI (symbol_ref:DI ("memcpy") [flags 0x41] > ) [0 memcpy S8 A64]) > (const_int 1 [0x1]))) > (clobber (reg:DI 320 b0)) > (clobber (scratch:DI)) > (clobber (scratch:DI)) > ]) 332 {call_value_gp} > (expr_list:REG_DEAD (reg:DI 122 out2) > (expr_list:REG_DEAD (reg:DI 121 out1) > (expr_list:REG_DEAD (reg:DI 120 out0) > (expr_list:REG_UNUSED (reg:DI 8 r8) > (expr_list:REG_EH_REGION (const_int 0 [0]) > (nil)) > (expr_list:REG_DEP_TRUE (use (reg:DI 1 r1)) > (expr_list:REG_DEP_TRUE (use (reg:DI 122 out2)) > (expr_list:REG_DEP_TRUE (use (reg:DI 121 out1)) > (expr_list:REG_DEP_TRUE (use (reg:DI 120 out0)) > (nil)) > > > for the memory expression > > (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64]) > (reg:DI 112 in0)) > > may_be_aliased() returns false, but register 341 is passed as the > second parameter to memcpy. I suspect this is a bug elsewhere which is > exposed by this patch. If someone knows why this might be happening, I > can tighten the can_escape() function appropriately. > > Thanks, > Easwaran > > > > On Tue, Jun 14, 2011 at 9:34 AM, Jeff Law wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> On 05/10/11 13:18, Easwaran Raman wrote: >> >>>>> I am not sure I understand the problem here. If there is a wild read >>>>> from asm, the instruction has the wild_read flag set. The if statement >>>>> checks if that flag is set and if so it clears the bitmap - which was >>>>> the original behavior. Originally, only if read_rec is non NULL you >>>>> need to recompute the kill set. Now, even if read_rec is NULL, >>>>> non_frame_wild_read could be set requiring the kill set to be >>>>> modified, which is what this patch does. In fact, isn't what you have >>>>> written above the equivalent to what is in the patch as '/* Leave this >>>>> clause unchanged */' is the same as >>>>> >>>>> if (dump_file) >>>>> fprintf (dump_file, "regular read\n"); >>>>> scan_reads_nospill (insn_info, v, NULL); >>>>> >>>>> >>>>> -Easwaran >>>>> >>> >>>> Ping. I have changed the test case to use int and added another test >>>> case that shows DSE doesn't happen when the struct instance is >>>> volatile (wild_read gets set in that ca
Mark variables addressable if they are copied using libcall in RTL expander
This fixes bugs introduced by r175063. OK for trunk if there are no test regressions? -Easwaran 2011-06-20 Easwaran Raman PR rtl-optimization/49429 * expr.c (emit_block_move_hints): Mark MEM_EXPR(x) and MEM_EXPR(y) addressable if emit_block_move_via_libcall is used to copy y into x. Index: gcc/expr.c === --- gcc/expr.c (revision 175081) +++ gcc/expr.c (working copy) @@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu else if (may_use_call && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) -retval = emit_block_move_via_libcall (x, y, size, - method == BLOCK_OP_TAILCALL); +{ + /* Since x and y are passed to a libcall, mark the corresponding + tree EXPR as addressable. */ + tree y_expr = MEM_EXPR (y); + tree x_expr = MEM_EXPR (x); + if (y_expr) +mark_addressable (y_expr); + if (x_expr) +mark_addressable (x_expr); + retval = emit_block_move_via_libcall (x, y, size, + method == BLOCK_OP_TAILCALL); +} + else emit_block_move_via_loop (x, y, size, align);
Re: Mark variables addressable if they are copied using libcall in RTL expander
On Wed, Jun 22, 2011 at 7:13 AM, Eric Botcazou wrote: >> I fear this isn't enough considering pass-by-value aggregates that >> are callee copied. > > It's indeed not sufficient for arguments passed by reference but > callee-copied. > > This is PR target/49454. For gcc.c-torture/execute/2717-1.c: > > typedef struct trio { int a, b, c; } trio; > > int > foo (trio t, int i) > { > return bar (i, t); > } > > yiedls in the .optimized dump: > > foo (struct trio t, int i) > { > int D.1968; > struct trio t.0; > > : > t.0 = t; > D.1968_2 = bar (i_1(D), t.0); > return D.1968_2; > } > > and the aggregate copy is elided by DSE because t.0 isn't may_be_aliased. > This > seems to be a pre-existing bug though: its address is passed to bar in RTL. > > -- > Eric Botcazou > Is the following patch a reasonable fix for this case? I assume I should add similar code inside emit_library_call_value_1. -Easwaran --- gcc/calls.c (revision 175081) +++ gcc/calls.c (working copy) @@ -1073,6 +1073,8 @@ initialize_argument_information (int num_actuals A callee_copies = reference_callee_copied (args_so_far, TYPE_MODE (type), type, argpos < n_named_args); + if (callee_copies) +mark_addressable (args[i].tree_value); /* If we're compiling a thunk, pass through invisible references instead of making a copy. */
Re: Mark variables addressable if they are copied using libcall in RTL expander
On Thu, Jun 23, 2011 at 3:22 AM, Eric Botcazou wrote: >> So, what's the patch(es) that need approval now? > > Original expr.c patch for PR rtl-optimization/49429 + adjusted and augmented > calls.c patch for PR target/49454. Everything is in this thread. > > Easwaran, would you mind posting a consolidated patch? > > -- > Eric Botcazou > Here is the revised patch. Bootstraps and all tests pass on x86_64-unknown-linux. OK for trunk? 2011-06-23 Easwaran Raman PR rtl-optimization/49429 PR target/49454 * expr.c (emit_block_move_hints): Mark MEM_EXPR(x) and MEM_EXPR(y) addressable if emit_block_move_via_libcall is used to copy y into x. * calls.c (initialize_argument_information): Mark an argument addressable if it is passed by invisible reference. (emit_library_call_value_1): Mark MEM_EXPR (val) addressable if it is passed by reference. Index: gcc/expr.c === --- gcc/expr.c (revision 175346) +++ gcc/expr.c (working copy) @@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu else if (may_use_call && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) -retval = emit_block_move_via_libcall (x, y, size, - method == BLOCK_OP_TAILCALL); +{ + /* Since x and y are passed to a libcall, mark the corresponding + tree EXPR as addressable. */ + tree y_expr = MEM_EXPR (y); + tree x_expr = MEM_EXPR (x); + if (y_expr) +mark_addressable (y_expr); + if (x_expr) +mark_addressable (x_expr); + retval = emit_block_move_via_libcall (x, y, size, + method == BLOCK_OP_TAILCALL); +} + else emit_block_move_via_loop (x, y, size, align); Index: gcc/calls.c === --- gcc/calls.c (revision 175346) +++ gcc/calls.c (working copy) @@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A && TREE_CODE (base) != SSA_NAME && (!DECL_P (base) || MEM_P (DECL_RTL (base) { + mark_addressable (args[i].tree_value); + /* We can't use sibcalls if a callee-copied argument is stored in the current function's frame. */ if (!call_from_thunk_p && DECL_P (base) && !TREE_STATIC (base)) @@ -3524,7 +3526,12 @@ emit_library_call_value_1 (int retval, rtx orgfun, } if (MEM_P (val) && !must_copy) - slot = val; +{ + tree val_expr = MEM_EXPR (val); + if (val_expr) +mark_addressable (val_expr); + slot = val; +} else { slot = assign_temp (lang_hooks.types.type_for_mode (mode, 0),
Re: Mark variables addressable if they are copied using libcall in RTL expander
On Thu, Jun 23, 2011 at 12:16 PM, Jakub Jelinek wrote: > On Thu, Jun 23, 2011 at 12:02:35PM -0700, Easwaran Raman wrote: >> + if (y_expr) >> + mark_addressable (y_expr); > > Please watch formatting, a tab should be used instead of 8 spaces. > >> + if (x_expr) >> + mark_addressable (x_expr); > > Ditto. > >> @@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A >> && TREE_CODE (base) != SSA_NAME >> && (!DECL_P (base) || MEM_P (DECL_RTL (base) >> { >> + mark_addressable (args[i].tree_value); >> + > > Likewise, plus the line is indented too much, each level should be indented > by 2 chars. > > Jakub > I have attached a new patch that fixes the formatting issues. Thanks, Easwaran Index: gcc/expr.c === --- gcc/expr.c (revision 175346) +++ gcc/expr.c (working copy) @@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu else if (may_use_call && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) -retval = emit_block_move_via_libcall (x, y, size, - method == BLOCK_OP_TAILCALL); +{ + /* Since x and y are passed to a libcall, mark the corresponding + tree EXPR as addressable. */ + tree y_expr = MEM_EXPR (y); + tree x_expr = MEM_EXPR (x); + if (y_expr) + mark_addressable (y_expr); + if (x_expr) + mark_addressable (x_expr); + retval = emit_block_move_via_libcall (x, y, size, + method == BLOCK_OP_TAILCALL); +} + else emit_block_move_via_loop (x, y, size, align); Index: gcc/calls.c === --- gcc/calls.c (revision 175346) +++ gcc/calls.c (working copy) @@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A && TREE_CODE (base) != SSA_NAME && (!DECL_P (base) || MEM_P (DECL_RTL (base) { + mark_addressable (args[i].tree_value); + /* We can't use sibcalls if a callee-copied argument is stored in the current function's frame. */ if (!call_from_thunk_p && DECL_P (base) && !TREE_STATIC (base)) @@ -3524,7 +3526,12 @@ emit_library_call_value_1 (int retval, rtx orgfun, } if (MEM_P (val) && !must_copy) - slot = val; + { + tree val_expr = MEM_EXPR (val); + if (val_expr) + mark_addressable (val_expr); + slot = val; + } else { slot = assign_temp (lang_hooks.types.type_for_mode (mode, 0),
[google] Backport r175063, r175082 and r175384 from trunk to google/gcc-4_6 branch.
Commited.
Improve stack layout heuristic.
Hi, This patch impoves the heuristic used in assigning stack location to stack variables. Currently, if there are 3 variables A, B and C with their sizes in increasing order and A and C have a conflict, it will put A and B in a partition and C in a separate partition with a total frame size of sizeof(B) + sizeof(C). This patch puts B and C in the same partition and A in a separate partition, with a total size of sizeof(A) + sizeof(C). The other improvement is when -fno-strict-aliasing is used. In that case, it is ok to share stack slot for a variable whose type transitively contains a union. The other change in this patch removes the field offset in struct stack_var. Right now, the field is always 0 due to the way it is set in partition_stack_vars. Bootstraps and no test regressions on x86_64-unknown-linux. OK for trunk? -Easwaran 2011-04-17 Easwaran Raman * gcc/testsuite/gcc.dg/stack-layout-1.c: New test. * gcc/testsuite/gcc.dg/stack-layout-2.c: New test. * gcc/cfgexpand.c (struct stack_var): Remove OFFSET... (add_stack_var): ...and its reference here... (expand_stack_vars): ...and here. (add_alias_set_conflicts): Add conflict to union variables only when strict aliasing is used. (stack_var_cmp): Sort by descending order of size. (partition_stack_vars): Change heuristic. (union_stack_vars): Fix to refelect changes in partition_stack_vars. (dump_stack_var_partition): Add newline after each partition. Index: gcc/testsuite/gcc.dg/stack-layout-1.c === --- gcc/testsuite/gcc.dg/stack-layout-1.c (revision 0) +++ gcc/testsuite/gcc.dg/stack-layout-1.c (revision 0) @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-strict-aliasing -fdump-rtl-expand" } */ +union U { + int a; + float b; +}; +struct A { + union U u1; + char a[100]; +}; +void bar (struct A *); +void foo () + { +{ + struct A a; + bar (&a); +} +{ + struct A a; + bar (&a); +} + } + +/* { dg-final { scan-rtl-dump-times "Partition" 1 "expand" } } */ +/* { dg-final { cleanup-rtl-dump "expand" } } */ Index: gcc/testsuite/gcc.dg/stack-layout-2.c === --- gcc/testsuite/gcc.dg/stack-layout-2.c (revision 0) +++ gcc/testsuite/gcc.dg/stack-layout-2.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-expand" } */ +void bar( char *); +int foo() +{ + int i=0; + { +char a[8000]; +bar(a); +i += a[0]; + } + { +char a[8192]; +char b[32]; +bar(a); +i += a[0]; +bar(b); +i += a[0]; + } + return i; +} +/* { dg-final { scan-rtl-dump "size 8192" "expand" } } */ +/* { dg-final { scan-rtl-dump "size 32" "expand" } } */ Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 171954) +++ gcc/cfgexpand.c (working copy) @@ -158,11 +158,6 @@ /* The Variable. */ tree decl; - /* The offset of the variable. During partitioning, this is the - offset relative to the partition. After partitioning, this - is relative to the stack frame. */ - HOST_WIDE_INT offset; - /* Initially, the size of the variable. Later, the size of the partition, if this variable becomes it's partition's representative. */ HOST_WIDE_INT size; @@ -267,7 +262,6 @@ v = &stack_vars[stack_vars_num]; v->decl = decl; - v->offset = 0; v->size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (decl)), 1); /* Ensure that all variables have size, so that &a != &b for any two variables that are simultaneously live. */ @@ -372,8 +366,9 @@ to elements will conflict. In case of unions we have to be careful as type based aliasing rules may say access to the same memory does not conflict. So play -safe and add a conflict in this case. */ - || contains_union) +safe and add a conflict in this case when -fstrict-aliasing + is used. */ + || (contains_union && flag_strict_aliasing)) add_stack_var_conflict (i, j); } } @@ -403,9 +398,9 @@ return (int)largeb - (int)largea; /* Secondary compare on size, decreasing */ - if (sizea < sizeb) -return -1; if (sizea > sizeb) +return -1; + if (sizea < sizeb) return 1; /* Tertiary compare on true alignment, decreasing. */ @@ -564,28 +559,18 @@ /* A subroutine of partition_stack_vars. The UNION portion of a UNION/FIND partitioning algorithm. Partitions A and B are known to be non-conflicting. - Merge them into a single partition A. + Merge them into a single
Re: Improve stack layout heuristic.
On Sun, Apr 17, 2011 at 1:39 PM, Steven Bosscher wrote: > On Sun, Apr 17, 2011 at 9:39 PM, Easwaran Raman wrote: >> @@ -372,8 +366,9 @@ >> to elements will conflict. In case of unions we have >> to be careful as type based aliasing rules may say >> access to the same memory does not conflict. So play >> - safe and add a conflict in this case. */ >> - || contains_union) >> + safe and add a conflict in this case when -fstrict-aliasing >> + is used. */ >> + || (contains_union && flag_strict_aliasing)) >> add_stack_var_conflict (i, j); >> } >> } > > Are you sure this change is safe? See http://gcc.gnu.org/PR25654 > > Ciao! > Steven > I tried the test case in PR25654 and with my patch and on -O2 -fno-strict-aliasing it puts the two variables in the same partition and the test passes. My understanding of that issue is that with -fstrict-aliasing, the short * and int * are assumed to point to different locations and hence they can't share stack slots, but with -fno-strict-aliasing that assumption is not valid. Thanks, Easwaran
Re: Improve stack layout heuristic.
On Mon, Apr 18, 2011 at 5:58 AM, Michael Matz wrote: > > Hi, > > [FWIW I can't approve patches, but some feedback nevertheless] > > On Sun, 17 Apr 2011, Easwaran Raman wrote: > > > This patch impoves the heuristic used in assigning stack location to > > stack variables. Currently, if there are 3 variables A, B and C with > > their sizes in increasing order and A and C have a conflict, it will > > put A and B in a partition and C in a separate partition with a total > > frame size of sizeof(B) + sizeof(C). This patch puts B and C in the > > same partition and A in a separate partition, with a total size of > > sizeof(A) + sizeof(C). > > That's the change in stack_var_cmp, to match the comment, right? Makes > sense. > > > The other change in this patch removes the field offset in struct > > stack_var. Right now, the field is always 0 due to the way it is set in > > partition_stack_vars. > > Huh, indeed, starting with the initial version of that code already. The > idea clearly was to pack multiple conflicting small objects into the space > of only one larger non-conflicting one by placing them at different > offsets, but that never seems to have been implemented and would require > different tracking of conflicts. I think removing the offset tracking > right now is okay, can be reintroduced once somebody gets motivated > enough. Yes, the intent seems to be what you describe above. I haven't thought about it much, but I am not sure how much a non-backtracking version will buy over the current heuristic. > > > - and merge them into partition A. */ > > - for (last = i = b; i != EOC; last = i, i = stack_vars[i].next) > > - { > > - stack_vars[i].offset += offset; > > - stack_vars[i].representative = a; > > - } > > - stack_vars[last].next = stack_vars[a].next; > > + /* Add B to A's partition. */ > > + stack_vars[b].next = stack_vars[a].next; > > + stack_vars[b].representative = a; > > Hmm. This seems fishy. You don't update the representatives of the > members of the partition that B is the leader of. Additionally you break > the chain of B's members. That is only a problem if B actually has more > than one member. That might be always the case with your changed sorting > order, but it's an important invariant, so please assert this in this > routine. > B always has one member is an invariant in my scheme. The object corresponding to the outer loop index is always the representative. If B has to have more than one members, it must have been visited in the outer loop in some earlier iteration. But then, its index in the stack_vars_sorted must be greater than the current i. I have added an assertion than stack_vars[b].next == EOC in union_stack_vars which is true only when B has one member. > > > @@ -596,7 +581,7 @@ > > if (vb->conflicts) > > { > > EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi) > > - add_stack_var_conflict (a, stack_vars[u].representative); > > + add_stack_var_conflict (a, u); > > Please don't. This uselessly bloats the conflict bitmaps. It is sufficient to add the conflicts of a variable only when it is not merged into some group. I am adding a check to that effect. I am attaching a new patch which excludes the strict aliasing check (which I will send separately) and the changes I have mentioned above. Bootstraps and no regressions in tests. Thanks, Easwaran > > Ciao, > Michael. Index: gcc/testsuite/gcc.dg/stack-layout-2.c === --- gcc/testsuite/gcc.dg/stack-layout-2.c (revision 0) +++ gcc/testsuite/gcc.dg/stack-layout-2.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-expand" } */ +void bar( char *); +int foo() +{ + int i=0; + { +char a[8000]; +bar(a); +i += a[0]; + } + { +char a[8192]; +char b[32]; +bar(a); +i += a[0]; +bar(b); +i += a[0]; + } + return i; +} +/* { dg-final { scan-rtl-dump "size 8192" "expand" } } */ +/* { dg-final { scan-rtl-dump "size 32" "expand" } } */ Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 171954) +++ gcc/cfgexpand.c (working copy) @@ -158,11 +158,6 @@ /* The Variable. */ tree decl; - /* The offset of the variable. During partitioning, this is the - offset relative to the partition. After partitioning, this - is relative to the stack frame. */ - HOST_WIDE_INT offset; - /* Initially, the size of the variable. Later, the size of the partition, if this variable becomes it's partition's re
Re: Improve stack layout heuristic.
On Tue, Apr 19, 2011 at 5:08 AM, Michael Matz wrote: > Hi, > > On Mon, 18 Apr 2011, Easwaran Raman wrote: > >> > > @@ -596,7 +581,7 @@ >> > > if (vb->conflicts) >> > > { >> > > EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi) >> > > - add_stack_var_conflict (a, stack_vars[u].representative); >> > > + add_stack_var_conflict (a, u); >> > >> > Please don't. This uselessly bloats the conflict bitmaps. >> >> It is sufficient to add the conflicts of a variable only when it is >> not merged into some group. > > That is correct but is also what the use of stack_vars[u].representative > achieves alone, ... > >> I am adding a check to that effect. > > ... without any check. > > @@ -596,7 +581,8 @@ > if (vb->conflicts) > { > EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi) > - add_stack_var_conflict (a, stack_vars[u].representative); > + if (stack_vars[u].next == EOC && stack_vars[u].representative == u) > + add_stack_var_conflict (a, u); > BITMAP_FREE (vb->conflicts); > } > } > > What's your objective with this change? I find the original code clearer. Let us say we try to merge 'a' to 'b' and 'a' has conflicts with many members of an existing partition C. It is not necessary to add all those conflicts to 'b' since they will be never considered in the call to union_stack_vars. I was motivated by your comment on bit-vector bloat to try this, but if this affects readability I'll happily revert back to what it was before. -Easwaran > > > Ciao, > Michael.
[google]Pass --save-temps to the assembler (issue4436049)
This makes the gcc driver pass the --save-temps option to the assembler or assembler wrapper so that post-assembly tools like MAO can be integrated. Bootstraps on x86_64. Ok for google/main? 2011-04-19 Easwaran Raman * gcc/gcc.c (static const char *asm_options): Pass --save-temps to the assembler. Index: gcc/gcc.c === --- gcc/gcc.c (revision 172727) +++ gcc/gcc.c (working copy) @@ -780,7 +780,7 @@ static const char *asm_options = #if HAVE_GNU_AS /* If GNU AS is used, then convert -w (no warnings), -I, and -v to the assembler equivalents. */ -"%{v} %{w:-W} %{I*} " +"%{v} %{w:-W} %{I*} %{save-temps:--save-temps} " #endif "%a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O}"; -- This patch is available for review at http://codereview.appspot.com/4436049
Re: [google]Pass --save-temps to the assembler (issue4436049)
On Tue, Apr 19, 2011 at 12:19 PM, Diego Novillo wrote: > On Tue, Apr 19, 2011 at 15:02, Easwaran Raman wrote: >> This makes the gcc driver pass the --save-temps option to the assembler or >> assembler wrapper so that post-assembly tools like MAO can be integrated. >> Bootstraps on x86_64. Ok for google/main? >> >> 2011-04-19 Easwaran Raman >> >> * gcc/gcc.c (static const char *asm_options): Pass > > No need to specify 'static const char *'. Just 'asm_options'. > >> --save-temps to the assembler. > > Could you add a comment stating what version of binutils we need to use this? This is to work with the assembler wrapper that has to know whether to save the assembly file after running any post-assembly tools like MAO. > I will commit this patch for you. Given that you will be sending > other patches, please fill in the write access form at > http://sourceware.org/cgi-bin/pdw/ps_form.cgi > > > Diego. >
Re: [google]Pass --save-temps to the assembler (issue4436049)
On Tue, Apr 19, 2011 at 12:19 PM, Diego Novillo wrote: > On Tue, Apr 19, 2011 at 15:02, Easwaran Raman wrote: >> This makes the gcc driver pass the --save-temps option to the assembler or >> assembler wrapper so that post-assembly tools like MAO can be integrated. >> Bootstraps on x86_64. Ok for google/main? >> >> 2011-04-19 Easwaran Raman >> >> * gcc/gcc.c (static const char *asm_options): Pass > > No need to specify 'static const char *'. Just 'asm_options'. > >> --save-temps to the assembler. > > Could you add a comment stating what version of binutils we need to use this? > > I will commit this patch for you. Given that you will be sending > other patches, please fill in the write access form at > http://sourceware.org/cgi-bin/pdw/ps_form.cgi > > > Diego. > The revised patch has a comment that this should be used with an assembler wrapper that can recognize --save-temps. -Easwaran 2011-04-19 Easwaran Raman * gcc/gcc.c (asm_options): Pass --save-temps to assembler. Index: gcc/gcc.c === --- gcc/gcc.c (revision 172727) +++ gcc/gcc.c (working copy) @@ -775,12 +775,16 @@ static const char *cc1_options = %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants}\ %{coverage:-fprofile-arcs -ftest-coverage}"; +/* If an assembler wrapper is used to invoke post-assembly tools + like MAO, --save-temps need to be passed to save the output of + such post-processing tools. This is not compatible with vanilla + binutils gas which does not recognize --save-temps. */ static const char *asm_options = "%{-target-help:%:print-asm-header()} " #if HAVE_GNU_AS /* If GNU AS is used, then convert -w (no warnings), -I, and -v to the assembler equivalents. */ -"%{v} %{w:-W} %{I*} " +"%{v} %{w:-W} %{I*} %{save-temps:--save-temps} " #endif "%a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O}";
Allow union variables to share stack slots wwith -fno-strict-aliasing (issue4444051)
Hi, This patch allows variables whose type transitively contains a union to share stack slots if -fno-strict-aliasing is used. Bootstraps on x86_64 with no test regressions. Also tested by changing flag_strict_aliasing to 0 by default. Bootstraps and no test regressions when compared to flag_strict_aliasing=0 without this patch. OK for trunk? -Easwaran 2011-04-19 Easwaran Raman * gcc/testsuite/gcc.dg/stack-layout-1.c: New * gcc/cfgexpand.c (add_alias_set_conflicts): Add conflicts with a variable containing union type only with -fstrict-aliasing. Index: gcc/testsuite/gcc.dg/stack-layout-1.c === --- gcc/testsuite/gcc.dg/stack-layout-1.c (revision 0) +++ gcc/testsuite/gcc.dg/stack-layout-1.c (revision 0) @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-strict-aliasing -fdump-rtl-expand" } */ +union U { + int a; + float b; +}; +struct A { + union U u1; + char a[100]; +}; +void bar (struct A *); +void foo () + { +{ + struct A a; + bar (&a); +} +{ + struct A a; + bar (&a); +} + } + +/* { dg-final { scan-rtl-dump-times "Partition" 1 "expand" } } */ +/* { dg-final { cleanup-rtl-dump "expand" } } */ Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 172657) +++ gcc/cfgexpand.c (working copy) @@ -373,8 +373,9 @@ add_alias_set_conflicts (void) to elements will conflict. In case of unions we have to be careful as type based aliasing rules may say access to the same memory does not conflict. So play -safe and add a conflict in this case. */ - || contains_union) +safe and add a conflict in this case when + -fstrict-aliasing is used. */ + || (contains_union && flag_strict_aliasing)) add_stack_var_conflict (i, j); } } -- This patch is available for review at http://codereview.appspot.com/051
Re: Improve stack layout heuristic.
On Wed, Apr 20, 2011 at 6:53 AM, Michael Matz wrote: > Hi, > > On Tue, 19 Apr 2011, Easwaran Raman wrote: > >> > That is correct but is also what the use of stack_vars[u].representative >> > achieves alone, ... >> > >> >> I am adding a check to that effect. >> > >> > ... without any check. >> > >> > @@ -596,7 +581,8 @@ >> > if (vb->conflicts) >> > { >> > EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi) >> > - add_stack_var_conflict (a, stack_vars[u].representative); >> > + if (stack_vars[u].next == EOC && stack_vars[u].representative == >> > u) >> > + add_stack_var_conflict (a, u); >> > BITMAP_FREE (vb->conflicts); >> > } >> > } >> > >> > What's your objective with this change? I find the original code clearer. >> >> Let us say we try to merge 'a' to 'b' and 'a' has conflicts with many >> members of an existing partition C. It is not necessary to add all >> those conflicts to 'b' since they will be never considered in the call >> to union_stack_vars. > > Right, that's why I was objecting to your initial change. a I agree that my initial change - adding a conflict with u - was wrong. > The original > code (adding stack_vars[u].representative to the conflicts of A) made sure > the target conflict bitmap only got representatives added. In my above example, it is not even necessary to add a conflict between 'b' and representative(C) since it is already in a partition. But you're right - not adding that conflict doesn't actually reduce the size of bit maps. Reverting back to what was there originally. Thanks, Easwaran That's why I > was asking why you changed this area at all. > >> I was motivated by your comment on bit-vector bloat to try this, but if >> this affects readability I'll happily revert back to what it was before. > > > Ciao, > Michael. Index: gcc/testsuite/gcc.dg/stack-layout-2.c === --- gcc/testsuite/gcc.dg/stack-layout-2.c (revision 0) +++ gcc/testsuite/gcc.dg/stack-layout-2.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-expand" } */ +void bar( char *); +int foo() +{ + int i=0; + { +char a[8000]; +bar(a); +i += a[0]; + } + { +char a[8192]; +char b[32]; +bar(a); +i += a[0]; +bar(b); +i += a[0]; + } + return i; +} +/* { dg-final { scan-rtl-dump "size 8192" "expand" } } */ +/* { dg-final { scan-rtl-dump "size 32" "expand" } } */ Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 171954) +++ gcc/cfgexpand.c (working copy) @@ -158,11 +158,6 @@ /* The Variable. */ tree decl; - /* The offset of the variable. During partitioning, this is the - offset relative to the partition. After partitioning, this - is relative to the stack frame. */ - HOST_WIDE_INT offset; - /* Initially, the size of the variable. Later, the size of the partition, if this variable becomes it's partition's representative. */ HOST_WIDE_INT size; @@ -267,7 +262,6 @@ v = &stack_vars[stack_vars_num]; v->decl = decl; - v->offset = 0; v->size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (decl)), 1); /* Ensure that all variables have size, so that &a != &b for any two variables that are simultaneously live. */ @@ -403,9 +397,9 @@ return (int)largeb - (int)largea; /* Secondary compare on size, decreasing */ - if (sizea < sizeb) -return -1; if (sizea > sizeb) +return -1; + if (sizea < sizeb) return 1; /* Tertiary compare on true alignment, decreasing. */ @@ -564,28 +558,19 @@ /* A subroutine of partition_stack_vars. The UNION portion of a UNION/FIND partitioning algorithm. Partitions A and B are known to be non-conflicting. - Merge them into a single partition A. + Merge them into a single partition A. */ - At the same time, add OFFSET to all variables in partition B. At the end - of the partitioning process we've have a nice block easy to lay out within - the stack frame. */ - static void -union_stack_vars (size_t a, size_t b, HOST_WIDE_INT offset) +union_stack_vars (size_t a, size_t b) { - size_t i, last; struct stack_var *vb = &stack_vars[b]; bitmap_iterator bi; unsigned u; - /* Update each element of partition B with the given offset, - and merge them into partition A. */ - for (last = i = b; i != EOC; last = i, i = stack_vars[i].next) -{ - stack_vars[i].offset += offset;