Hi,
On Fri, 26 Oct 2007, Tomash Brechko wrote:
> It was already said that instead of disallowing all optimization with
> volatile, the optimization itself may be made a bit differently.
> Besides, the concern that it will hurt performance at large is a bit
> far-stretched. You still may speculatively store to automatic var for
> which address was never taken, and this alone covers 50%--80% of
> cases.
Both, the assessment of far-stretchedness and these numbers seem to be
invented ad hoc. The latter is irrelevant (it's not interesting how many
cases there are, but how important those cases which occur are, for some
metric, let's say performance). And the former isn't true, i.e. the
concern is not far-stretched. For 456.hmmer for instance it is crucial
that this transformation happens, the basic situation looks like so:
int f(int M, int *mc, int *mpp, int *tpmm, int *ip, int *tpim, int *dpp,
int *tpdm, int xmb, int *bp, int *ms)
{
int k, sc;
for (k = 1; k <= M; k++)
{
mc[k] = mpp[k-1] + tpmm[k-1];
if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc;
if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc;
if ((sc = xmb + bp[k]) > mc[k]) mc[k] = sc;
mc[k] += ms[k];
}
}
Here the conditional stores to mc[k] are better be implemented as
conditional moves, otherwise you loose about 25% performance on some
platforms. See PR27313, for which I implemented this transformation on
the tree level. A similar transformation happens already since much
longer time by the RTL if-cvt. All of these are currently completely
valid transformations, so they could only be redefined as invalid by some
other memory model. Such other memory model has to take into account the
performance implications, which do exist. Contrary to what some
proponents of a different model claim. Certainly some suggestions for
another memory model look quite similar to considering all non-automatic
objects as volatile, at which point the question should be allowed why not
simply using 'volatile'.
> Only globals, or locals which address was passed to some
> function, should be treated specially. Also, for the case
>
> void
> f(int set_v, int *v)
> {
> if (set_v)
> *v = 1;
> }
>
> there's no load-maybe_update-store optimization, so there won't be
> slowdown for such cases also (BTW, how this case is different from
> when v is global?).
The difference is, that 'v' might be zero, hence *v could trap, hence it
can't be moved out of its control region. If you somehow could determine
that *v can't trap (e.g. by having a dominating access to it already) then
the transformation will be done.
Ciao,
Michael.