https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91861

            Bug ID: 91861
           Summary: invalid vectorization of isless, islessequal, etc.
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

Test case (cf. https://godbolt.org/z/z3TH9F):

#include <cmath>

using V [[gnu::vector_size(16)]] = float;

V f(V x, V y) {
    int r [[gnu::vector_size(16)]];
    for (int i = 0; i < 4; ++i) {
        r[i] = -std::isless(x[i], y[i]);
    }
    return reinterpret_cast<V>(r);
}

Using `-O3`, the `std::isless` calls are vectorized to a cmpnltps instruction
which will raise FE_INVALID if one of the arguments is NaN. However, the math.h
compare functions are not allowed to raise FP exceptions.

There's also a missed optimization here:

Starting with AVX, one of the quiet compare instructions can be used. E.g.
translate isless to cmpp[sd] with predicate LT_OQ (0x11).

Without AVX, it's possible to use CMPORDPS and PCMPGTD:

isless(x, y) =>
xi = reinterpret as int vector(x)
yi = reinterpret as int vector(y)
xp = xi < 0 ? -(xi & 0x7fffffff) : xi
yp = yi < 0 ? -(yi & 0x7fffffff) : yi
cmpord(x, y) && (xp < yp)

Whether that's faster than a loop over ucomiss is still to be shown.

Reply via email to