Optimization for static local variables

2017-06-14 Thread Prachi Godbole
I'm developing a solution to optimize away intermediate stores (loads) for 
static local variables which are assigned to before referenced on every path 
through a function.

Currently GCC eliminates all loads/stores in a straight line code but not in 
control structures. AFAIU, passes like sccvn, lim (store motion in particular) 
sink etc. try to propagate the assignment exprs through SSA vars instead of 
through memory thereby eliminating almost all loads and some stores but still 
not all.

For example:

void some_function()
{  
static some_type sum;
...
for( i = 0 ; i < n ; i++ )
 {
 sum = matrix [i][n] ; 

 for( j = 0 ; j < i ; j++ )
 {
 sum -= matrix [i][j] * matrix [j][n] ; 
 }

matrix [i][n] = sum ;
}
...
}

Resulting SSA:

...
   [2.25%]:
  _10 = matrix[0][n_13(D)];
  sum = _10;
  goto ; [100.00%]
...
   [12.75%]:
  _1 = matrix[i_16][n_13(D)];
  if (i_16 > 0)
goto ; [100.00%]
  else
goto ; [0.00%]
...
   [85.00%]:
  # j_24 = PHI 
  # sum_lsm.4_27 = PHI <_6(6), _1(4)>
  _3 = matrix[i_16][j_24];
  _4 = matrix[j_24][n_13(D)];
  _5 = _3 * _4;
  _6 = sum_lsm.4_27 - _5;
  j_18 = j_24 + 1;
  if (i_16 > j_18)
goto ; [85.00%]
  else
goto ; [15.00%]
...
   [12.75%]:
  # _40 = PHI <_6(6)>
  goto ; [100.00%]

   [12.75%]:
  # sum_lsm.4_19 = PHI <_40(7), _1(4)>

   [15.00%]:
  # i_20 = PHI 
  # sum_lsm.4_8 = PHI 
  # sum_lsm.5_22 = PHI <1(9), 0(14)>
  matrix[i_20][n_13(D)] = sum_lsm.4_8;
  i_16 = i_20 + 1;
  if (n_13(D) > i_16)
goto ; [85.00%]
  else
goto ; [15.00%]

   [2.25%]:
  # sum_lsm.4_39 = PHI 
  # sum_lsm.5_38 = PHI 
  if (sum_lsm.5_38 != 0)
goto ; [0.00%]
  else
goto ; [100.00%]

   [0.00%]:
  sum = sum_lsm.4_39;
...

Although all intermediate loads are eliminated here, store to sum before and 
after the 'j' loop (see bb 14 and bb 12) still remain which can be eliminated.

My solution would reuse a lot of data structures and routines from the sink 
pass. So my question is, can I add this optimization into the sink pass itself 
or would it require a new pass and if yes, what would be the ideal place for it.
Any other pointers are more than welcome.

Thanks,
Prachi


RE: Optimization for static local variables

2017-06-14 Thread Prachi Godbole
> On Wed, Jun 14, 2017 at 1:14 PM, Prachi Godbole
>  wrote:
> > I'm developing a solution to optimize away intermediate stores (loads) for
> static local variables which are assigned to before referenced on every path
> through a function.
> >
> > Currently GCC eliminates all loads/stores in a straight line code but not in
> control structures. AFAIU, passes like sccvn, lim (store motion in particular)
> sink etc. try to propagate the assignment exprs through SSA vars instead of
> through memory thereby eliminating almost all loads and some stores but
> still not all.
> >
> > For example:
> >
> > void some_function()
> > {
> > static some_type sum;
> > ...
> > for( i = 0 ; i < n ; i++ )
> >  {
> >  sum = matrix [i][n] ;
> >
> >  for( j = 0 ; j < i ; j++ )
> >  {
> >  sum -= matrix [i][j] * matrix [j][n] ;
> >  }
> >
> > matrix [i][n] = sum ;
> > }
> > ...
> > }
> >
> > Resulting SSA:
> >
> > ...
> >[2.25%]:
> >   _10 = matrix[0][n_13(D)];
> >   sum = _10;
> >   goto ; [100.00%]
> > ...
> >[12.75%]:
> >   _1 = matrix[i_16][n_13(D)];
> >   if (i_16 > 0)
> > goto ; [100.00%]
> >   else
> > goto ; [0.00%]
> > ...
> >[85.00%]:
> >   # j_24 = PHI 
> >   # sum_lsm.4_27 = PHI <_6(6), _1(4)>
> >   _3 = matrix[i_16][j_24];
> >   _4 = matrix[j_24][n_13(D)];
> >   _5 = _3 * _4;
> >   _6 = sum_lsm.4_27 - _5;
> >   j_18 = j_24 + 1;
> >   if (i_16 > j_18)
> > goto ; [85.00%]
> >   else
> > goto ; [15.00%]
> > ...
> >[12.75%]:
> >   # _40 = PHI <_6(6)>
> >   goto ; [100.00%]
> >
> >[12.75%]:
> >   # sum_lsm.4_19 = PHI <_40(7), _1(4)>
> >
> >[15.00%]:
> >   # i_20 = PHI 
> >   # sum_lsm.4_8 = PHI 
> >   # sum_lsm.5_22 = PHI <1(9), 0(14)>
> >   matrix[i_20][n_13(D)] = sum_lsm.4_8;
> >   i_16 = i_20 + 1;
> >   if (n_13(D) > i_16)
> > goto ; [85.00%]
> >   else
> > goto ; [15.00%]
> >
> >[2.25%]:
> >   # sum_lsm.4_39 = PHI 
> >   # sum_lsm.5_38 = PHI 
> >   if (sum_lsm.5_38 != 0)
> > goto ; [0.00%]
> >   else
> > goto ; [100.00%]
> >
> >[0.00%]:
> >   sum = sum_lsm.4_39;
> > ...
> >
> > Although all intermediate loads are eliminated here, store to sum before
> and after the 'j' loop (see bb 14 and bb 12) still remain which can be
> eliminated.
> >
> > My solution would reuse a lot of data structures and routines from the sink
> pass. So my question is, can I add this optimization into the sink pass 
> itself or
> would it require a new pass and if yes, what would be the ideal place for it.
> > Any other pointers are more than welcome.
> 
> So the idea is that for any store to such variable that is dominated by a 
> kill the
> function exit doesn't count as a use, right?
> (unless the variable escapes the scope of the function, for example via a
> nested function -- which might be the case that is very difficult to detect).
> 
Yes that is the general idea. I had thought of initially being conservative and 
not optimizing at all when the variable could potentially escape the scope.

> I'm not sure what part of the sink pass is suitable for this but as sinking 
> walks
> post dominators backwards which also DSE does it would fit DSE as well, no?
> You'd pre-compute blocks containing kills (or do two backwards passes) and
> then special case GIMPLE_RETURN in dse_classify_store where previously
> ref_maybe_used_by_stmt_p covers those.
> 
I was looking for a one time solution rather than doing it as and when the 
opportunities present themselves.
But doing it in DSE is more advantageous, I can see that. So I'll go ahead with 
DSE.

> nested fn case:
> 
> void foo (void)
> {
>   static int i = 0;
>   int __attribute__((noinline)) bar () { return i; }
>   int j = bar ();
>   i = 1;
> }
> 
> I think we don't mark 'i' TREE_ADDRESSABLE (escaped) here so we simply
> treat 'i' as a global variable accessible from other scopes.
> 
> Richard.
> 
> > Thanks,
> > Prachi


Question about creating clones of ipcp clones

2024-09-11 Thread Prachi Godbole via Gcc
Hi,

I am trying to generate out-of-line clones of ipcp clones for an IPA pass that 
runs after IPA inline, where the new clone has same function body and same 
updated signature as the ipcp clone. This fails or asserts based on how the 
clone is created:

1. If param_adjustments and tree_map are not provided:
fails in clone materialization (tree_function_versioning ()) when updating 
signature and remapping calls because clone_of (i.e. ipcp clone) and the new 
clone differ in param information, or

2. If param_adjustments and tree_map are provided:
asserts in fnsummary duplication

What are the possible approaches to make it work?

Thanks,
Prachi