On Wed, Jan 09, 2019 at 11:10:25AM -0500, David Malcolm wrote:
> extern void vf1()
> {
>    #pragma vectorize enable
>    for ( int i = 0 ; i < 32768 ; i++ )
>      data [ i ] = std::sqrt ( data [ i ] ) ;
> }
> 
> Compiling on this x86_64 box with -fopt-info-vec-missed shows the

>   _7 = .SQRT (_1);
>   if (_1 u>= 0.0)
>     goto <bb 8>; [99.95%]
>   else
>     goto <bb 4>; [0.05%]
> 
>   <bb 8> [local count: 1062472912]:
>   goto <bb 5>; [100.00%]
> 
>   <bb 4> [local count: 531495]:
>   __builtin_sqrtf (_1);
> 
> I'm not sure where that control flow came from: it isn't in
>   sqrt-test.cc.104t.stdarg
> but is in
>   sqrt-test.cc.105t.cdce
> so I think it's coming from the argument-range code in cdce.
> 
> Arguably the location on the statement is wrong: it's on the loop
> header, when it presumably should be on the std::sqrt call.

See my either mail, it is the result of the -fmath-errno default,
the inline emitted sqrt doesn't handle errno setting and we emit
essentially x = sqrt (arg); if (__builtin_expect (arg < 0.0, 0)) sqrt (arg); 
where
the former sqrt is inline using HW instructions and the latter is the
library call.

With some extra work we could vectorize it; e.g. if we make it handle
OpenMP #pragma omp ordered simd efficiently, it would be the same thing
- allow non-vectorizable portions of vectorized loops by doing there a
scalar loop from 0 to vf-1 doing the non-vectorizable stuff + drop the 
limitation
that the vectorized loop is a single bb.  Essentially, in this case it would
be
  vec1 = vec_load (data + i);
  vec2 = vec_sqrt (vec1);
  if (__builtin_expect (any (vec2 < 0.0)))
    {
      for (int i = 0; i < vf; i++)
        sqrt (vec2[i]);
    }
  vec_store (data + i, vec2);
If that would turn to be way too hard, we could for the vectorization
purposes hide that into the .SQRT internal fn, say add a fndecl argument to
it if it should treat the exceptional cases some way so that the control
flow isn't visible in the vectorized loop.

        Jakub

Reply via email to