On Mon, 24 Feb 2014, Richard Biener wrote:
On Sun, Feb 23, 2014 at 12:09 PM, Marc Glisse <marc.gli...@inria.fr> wrote:
Hello,
a natural first step to optimize changes of rounding modes seems to be
making these 2 functions builtins. I don't know exactly how far
optimizations will be able to go (the fact that fesetround can fail
complicates things a lot). What is included here:
1) fegetround is pure.
2) Neither function aliases (use or clobber) any memory. I expect this is
likely not true on all platforms, some probably store the rounding mode in a
global variable that is accessible through other means (though mixing direct
accesses with calls to fe*etround seems a questionable style). Any opinion
or advice here?
Regtested on x86_64-linux-gnu, certainly not for 4.9.
Hohumm ... before making any of these functions less of a barrier than they
are (at least for loads and stores), shouldn't we think of, and fix, the lack of
any dependences between FP status word changes and actual arithmetic
instructions?
I'd welcome such change, but it is beyond my gcc-foo (and my free time)
for now.
In fact, using 'pure' or 'not use/clobber memory' here is exactly walking
on shaking grounds.
I have a hard time seeing how making fegetround pure can break anything
that accidentally works now. fegetround really is pure, it is fine to move
it across float operations.
Simply because we lack of a way to say that this stmt uses/clobbers the
FP state (fegetround would be 'const' when following your logic in 2)).
Not exactly, the logic in 2 is to say that the FP rounding mode is still a
global variable, but not one that is accessible directly, so the alias
oracle can never be called on a ref to it.
(note that we probably don't want a single FP state but separate rounding
mode on one side and exception flags on the other, since they are
preserved/modified very differently)
Otherwise, what is it worth optimizing^breaking things even more than
we do now?
With just my patch, probably not much. For someone interested, the kind of
thing that I would like:
#include <fenv.h>
double protect(double x){asm volatile("":"+mx"(x));return x;}
double add(double x,double y){
int old=fegetround();
fesetround(FE_UPWARD);
double res = protect(protect(x)+protect(y));
fesetround(old);
return res;
}
double f(double x,double y,double z){
return add(add(x,y),z);
}
(in practice I might add: if(old!=FE_UPWARD) in front of both fesetround)
becomes:
old_9 = fegetround ();
fesetround (2048);
__asm__ __volatile__("" : "=mx" x_10 : "0" x_2(D));
__asm__ __volatile__("" : "=mx" x_11 : "0" y_3(D));
_12 = x_10 + x_11;
__asm__ __volatile__("" : "=mx" res_13 : "0" _12);
fesetround (old_9);
old_14 = fegetround ();
fesetround (2048);
__asm__ __volatile__("" : "=mx" x_15 : "0" res_13);
__asm__ __volatile__("" : "=mx" x_16 : "0" z_6(D));
_17 = x_15 + x_16;
__asm__ __volatile__("" : "=mx" res_18 : "0" _17);
fesetround (old_14);
return res_18;
The interesting part is the 3 instructions in the middle. It is "easy" to
replace old_14 with old_9: the vuse has a def_stmt which is an fesetround,
and that fesetround must have succeeded because its argument has a
def_stmt which is an fegetround. We are left with:
fesetround (old_9);
fesetround (2048);
If we know somehow that the second fesetround can't fail (hardcode a list
of safe values per platform?), we can remove fesetround (old_9). If we
also assume that only fesetround can modify the rounding mode, we can
prove that the second fesetround is redundant and remove it. We could also
imagine saying that in both blocks the rounding mode is what you get when
it was old_9 and you try to set it to 2048, and thus remove both
middle fesetround at once. In any case, that brings the desired state of
both additions sharing a single pre/post fesetround.
Obviously that's wrong since the inline asm can modify the rounding mode
(though why would you mix that with calls to fe*etround?), so that would
probably require a nicer "protect", or even the special additions you
mention in the next email.
--
Marc Glisse