On Wed, Jan 09, 2019 at 11:10:25AM -0500, David Malcolm wrote: > extern void vf1() > { > #pragma vectorize enable > for ( int i = 0 ; i < 32768 ; i++ ) > data [ i ] = std::sqrt ( data [ i ] ) ; > } > > Compiling on this x86_64 box with -fopt-info-vec-missed shows the
> _7 = .SQRT (_1); > if (_1 u>= 0.0) > goto <bb 8>; [99.95%] > else > goto <bb 4>; [0.05%] > > <bb 8> [local count: 1062472912]: > goto <bb 5>; [100.00%] > > <bb 4> [local count: 531495]: > __builtin_sqrtf (_1); > > I'm not sure where that control flow came from: it isn't in > sqrt-test.cc.104t.stdarg > but is in > sqrt-test.cc.105t.cdce > so I think it's coming from the argument-range code in cdce. > > Arguably the location on the statement is wrong: it's on the loop > header, when it presumably should be on the std::sqrt call. See my either mail, it is the result of the -fmath-errno default, the inline emitted sqrt doesn't handle errno setting and we emit essentially x = sqrt (arg); if (__builtin_expect (arg < 0.0, 0)) sqrt (arg); where the former sqrt is inline using HW instructions and the latter is the library call. With some extra work we could vectorize it; e.g. if we make it handle OpenMP #pragma omp ordered simd efficiently, it would be the same thing - allow non-vectorizable portions of vectorized loops by doing there a scalar loop from 0 to vf-1 doing the non-vectorizable stuff + drop the limitation that the vectorized loop is a single bb. Essentially, in this case it would be vec1 = vec_load (data + i); vec2 = vec_sqrt (vec1); if (__builtin_expect (any (vec2 < 0.0))) { for (int i = 0; i < vf; i++) sqrt (vec2[i]); } vec_store (data + i, vec2); If that would turn to be way too hard, we could for the vectorization purposes hide that into the .SQRT internal fn, say add a fndecl argument to it if it should treat the exceptional cases some way so that the control flow isn't visible in the vectorized loop. Jakub