uweigand added a comment.

OK, let me try to expand on my point 3 above, which appears to have confused 
everybody :-)

First, let's distinguish two separate requirements: those on floating-point 
operations that explicitly run in regions with non-default control modes, and 
those on floating-point operations that run outside such regions.  (Note that 
those regions are by definition a subset of the regions marked as FENV_ACCESS 
ON, but not necessarily coincide with them.)

For the first class of operations, there are various requirements to constrain 
optimization.  All of those are handled in the current design by simply 
representing those operations via constrained intrinsics.  This can be done 
simply by always using constrained intrinsics whenever we are inside any 
FENV_ACCESS ON region.  (My comments above were not intended to refer to those 
at all.)

Now, for the second class of operations, most of these requirements do not 
apply.  However, there is one critical requirement that still **does** apply: 
such operations may never be moved to **inside** a region with non-default 
control modes.  (Where it makes a difference, the operation can in principle be 
even moved inside a FENV_ACCESS ON region, as long as the control modes 
actually have not yet changed from the default modes.)

As @andrew.w.kaylor mentioned above, to fulfil this requirement if suffices to 
prevent moving any such floating-point operation across any statement that 
modifies the control modes (or inspects the exception flags).  There is only a 
small number of statements that do so directly (those would all be inline asm 
or platform-specific builtins), but the action can be hidden behind a function 
call as well.  This applies in particular to the related C library calls with 
well-known names (which the compiler could detect), but in addition any random 
function call might **indirectly** contain such a statement (which the compiler 
will not know, in general).  Since LLVM today will freely move normal 
floating-point operations across normal function calls, this is a problem that 
needs to be addressed.

Now, one way to do that would be to ensure that any floating-point operation 
outside FENV_ACCESS ON regions is **also** implemented via constrained 
intrinsics, as long as there is any FENV_ACCESS ON present in the function at 
all.  As @cameron.mcinally mentioned above, that would prevent reordering 
(across any function call, even) and therefore solve the problem.  However, I 
believe there were concerns whether we can reliably ensure that property in the 
presence of optimizations like inlining, in particular if LTO is also used.  
Even if this can be resolved, I believe there were additional concerns that 
this might overly constrain optimization and lead to suboptimal performance.

This finally gets to the intent of my point 3 above, where I was trying to see 
if there is any way we can do better, that is correctly handle floating-point 
operations outside FENV_ACCESS ON regions **without** having to turn them all 
into constrained intrinsics.  One way might be to add some new code motion 
barrier in the IR optimizers that would prevent movement of floating-point 
operations across the location of any FENV_ACCESS pragma.  But  -- and this is 
I think what @kpn referered to above -- those locations don't really correspond 
to anything in the IR that could serve as such a barrier.  My observation above 
was simply that if we want to go that way, it in fact isn't really necessary to 
put those barriers at exactly these places.  Instead, we can put the barriers 
at those places that will actually modify (or inspect) the control words -- 
which are either special operations we can explicitly recognize, or general 
function calls -- but only those calls where the call site was originally 
inside a FENV_ACCESS ON region (and was marked by the front-end as such).  This 
has the advantage that a function call site is a natural place to act as 
barrier -- in fact it already acts as such e.g. for global memory accesses. (It 
would also be easy to implement as barrier on the MI / back-end level.)  In 
addition, this approach would not lead to any performance penalty outside of 
FENV_ACCESS ON regions at all.

But given that there is still infrastructure missing in the IR optimizers, I 
also think that at least in the first implementation, we probably should go 
with the original approach and just use constrained intrinsics everywhere in 
the function, and possibly add some function attribute that prevent any 
cross-inlining of functions built with constrained intrinsics with functions 
built with regular floating-point operations.


https://reviews.llvm.org/D53157



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to