https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98176

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongyu Wang from comment #2)
> >> I doubt the call is the issue btw.
> 
> The aliasing could be removed by 
> 
> float foo(int *x, int n, float tx)   
> {
>         float ret[n];
> 
>         #pragma omp simd
>         for (int i = 0; i < n; i++)
>         {
>             float s, c;                    
> 
>             s = c = tx * x[i];
>  
>             ret[0] += s*c;
>         }
> 
>         return ret[0];
> }
> 
> This is successfully vectorized, and the dump from lim2 has:
> 
> Moving statement
> ret.1__I_lsm.7 = (*ret.1_18)[0];
> 
> But for 
> 
> float foo(int *x, int n, float tx)   
> {
>         float ret[n];
> 
>         #pragma omp simd
>         for (int i = 0; i < n; i++)
>         {
>             float s, c;                    
> 
>             sincosf( tx * x[i] , &s, &c );
>  
>             ret[0] += s*c;
>         }
> 
>         return ret[0];
> }
> 
> It still could not be vectorized. I did initial debugging and see
> tree-ssa-loop-im.c has

I see ret[0] has store-motion applied.  You don't see it vectorized
because GCC doesn't know how to vectorize sincos (or cexpi which is
what it lowers it to).

If you replace sincosf with a random call then you'll hit the issue
that LIMs dependence analysis doesn't handle it at all since it cannot
represent it.  That will block further optimization in the loop.

That can possibly be improved.

> if (nonpure_call_p (stmt))
>   {                                        
>      maybe_never = true; 
>      outermost = NULL;                      
>   }
> 
> So no store-motion chance for any future statement in such block.

That's another issue - the call may not return.  Here the granularity
is per BB and thus loads/stores in the same BB are not considered for
sinking.

> As a comparison, this could also be vectorized with simd clone:
> 
> float foo(int *x, int n, float tx)   
> {
>         float ret[n];
> 
>         #pragma omp simd
>         for (int i = 0; i < n; i++)
>         {
>             float s, c;                    
> 
>             s = c = sinf( tx * x[i]);
>  
>             ret[0] += s*c;
>         }
> 
>         return ret[0];
> }

Reply via email to