[Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 08 Dec 2020 01:25:39 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
                 CC|                            |hubicka at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-12-08

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongyu Wang from comment #4)
> (In reply to Richard Biener from comment #3)
>  
> > I see ret[0] has store-motion applied.  You don't see it vectorized
> > because GCC doesn't know how to vectorize sincos (or cexpi which is
> > what it lowers it to).
> 
> I doubt so, after manually store motion
> 
> #include <cmath>
> 
> float foo(                
>                 int *x,                      
>                 int n,                 
>                 float tx
>                 )   
> {
>         float ret[n];
>         float tmp;
> 
>         #pragma omp simd
>         for (int i = 0; i < n; i++)
>         {
>             float s, c;                    
> 
>             sincosf( tx * x[i] , &s, &c );  
> 
>             tmp += s*c; 
>         }
> 
>         ret[0] += tmp; 
> 
>         return ret[0];
> }
> 
> with -Ofast -fopenmp-simd -std=c++11 it could be vectorized to call   
> _ZGVbN4vvv_sincosf
> 
> ret[0] is moved for sinf() case, but not sincosf() with above options.

What target are you targeting?  Can you provide the sincosf prototype
from your math.h?  (please attach preprocessed source).

I cannot reproduce sincosf _not_ being lowered to cexpif and thus
no longer having memory writes.

> > 
> > If you replace sincosf with a random call then you'll hit the issue
> > that LIMs dependence analysis doesn't handle it at all since it cannot
> > represent it.  That will block further optimization in the loop.
> > 
> > That can possibly be improved.
> > 
> 
> So could LIMs dependence analysis handle known library function and just
> analyze their memory parameter? Random call may have unknown behavior.

Well, in gather_mem_refs_stmt we have

  mem = simple_mem_ref_in_stmt (stmt, &is_stored);
  if (!mem)
    {
      /* We use the shared mem_ref for all unanalyzable refs.  */
      id = UNANALYZABLE_MEM_ID;

so in this path you could in theory special-case sincos{,f,l} and
handle it by adding two refs.  Note this will likely break down once
LIM decides to apply some motion to one of the refs since it doesn't
know how to re-materialize them (they'll be treated as independent
but are really tied together).

So the other option is to improve things by having multiple
"UNANALYZABLE_MEM_ID"s and while not handling them as candidates for
invariant or store motion handle them in the dependence analysis part,
possibly by just using the classical stmt-based dependence tests.
But this requires quite some refactoring.

> > > if (nonpure_call_p (stmt))
> > >   {                                        
> > >      maybe_never = true; 
> > >      outermost = NULL;                      
> > >   }
> > > 
> > > So no store-motion chance for any future statement in such block.
> > 
> > That's another issue - the call may not return.  Here the granularity
> > is per BB and thus loads/stores in the same BB are not considered for
> > sinking.
> > 
> 
> IMHO the condition may be too strict for known library calls.

Yes.

For a LIM testcase an example with a memcpy might be more practically
relevant.

For refactoring I'd start with classifying the unanalyzable refs as
separate ref ID, marking it with another bit like ref_unanalyzed in
in_mem_ref and asserting there's a single access of such refs.
The mem_refs_may_alias_p code then needs to use stmt-based alias
queries instead of refs_may_alias_p_1 using accesses_in_loop[0]->stmt.

And code testing for UNANALYZABLE_MEM_ID now needs to look at the
ref_unanalyzed flag to not consider those refs for transforms.

Note this may blow up the memory requirements for testcases with lots
of "unanalyzable" refs.

The nonpure-call code is more difficult to improve, even sincos can not return
when the access to s or c traps.  Analyzing the arguments might help here.
If you disregard that detail I think all ECF_LEAF|ECF_NOTHROW functions
return normally.

[Bug tree-optimization/98176] Loop invariant memory could not be hoisted when nonpure_call in loop body

Reply via email to