Hi,

On Fri, 26 Oct 2007, Tomash Brechko wrote:

> It was already said that instead of disallowing all optimization with
> volatile, the optimization itself may be made a bit differently.
> Besides, the concern that it will hurt performance at large is a bit
> far-stretched.  You still may speculatively store to automatic var for
> which address was never taken, and this alone covers 50%--80% of
> cases.

Both, the assessment of far-stretchedness and these numbers seem to be 
invented ad hoc.  The latter is irrelevant (it's not interesting how many 
cases there are, but how important those cases which occur are, for some 
metric, let's say performance).  And the former isn't true, i.e. the 
concern is not far-stretched.  For 456.hmmer for instance it is crucial 
that this transformation happens, the basic situation looks like so:

int f(int M, int *mc, int *mpp, int *tpmm, int *ip, int *tpim, int *dpp,
      int *tpdm, int xmb, int *bp, int *ms)
{
  int k, sc;
  for (k = 1; k <= M; k++)
    {
      mc[k] = mpp[k-1]   + tpmm[k-1];
      if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
      if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
      if ((sc = xmb  + bp[k])         > mc[k])  mc[k] = sc;
      mc[k] += ms[k];
    }
}

Here the conditional stores to mc[k] are better be implemented as 
conditional moves, otherwise you loose about 25% performance on some 
platforms.  See PR27313, for which I implemented this transformation on 
the tree level.  A similar transformation happens already since much 
longer time by the RTL if-cvt.  All of these are currently completely 
valid transformations, so they could only be redefined as invalid by some 
other memory model.  Such other memory model has to take into account the 
performance implications, which do exist.  Contrary to what some 
proponents of a different model claim.  Certainly some suggestions for 
another memory model look quite similar to considering all non-automatic 
objects as volatile, at which point the question should be allowed why not 
simply using 'volatile'.

> Only globals, or locals which address was passed to some
> function, should be treated specially.  Also, for the case
> 
>   void
>   f(int set_v, int *v)
>   {
>     if (set_v)
>       *v = 1;
>   }
> 
> there's no load-maybe_update-store optimization, so there won't be
> slowdown for such cases also (BTW, how this case is different from
> when v is global?).

The difference is, that 'v' might be zero, hence *v could trap, hence it 
can't be moved out of its control region.  If you somehow could determine 
that *v can't trap (e.g. by having a dominating access to it already) then 
the transformation will be done.


Ciao,
Michael.

Reply via email to