On Wed, Sep 25, 2024 at 11:36:33AM +0200, Richard Biener wrote: > I'll note it would be much simpler if we could write x > y ? x : y in > the intrinsic header.
Unfortunately, not so much. It can do that only for simple cases like _mm_min_p{s,d} or similar, which aren't masked, aren't the "scalar" ones and aren't the "rounding" cases (rounding for these intrinsics is solely about disabling exceptions). The "scalar" cases are actually vector ops, but the x < y ? x : y is only in the first lane, the rest of lanes come from x and one really can't represent it as say __builtin_shuffle with x < y ? x : y and x operands because that throws different exceptions. The masked cases are similarly a permutation, though this time with yet another operand rather than one of the provided one. Again, for exceptions it can't be simple shuffle. And one can actually mix the masked and scalar case together. And the round case is verification that the argument is constant 4 or 8, if 4 it acts as the non-rounded one (but possibly masked etc.), if 8 with exceptions disabled. For disabled exceptions, I'm afraid using UNLE/UNGE rather than GT/LT doesn't help because they still exception on sNaNs. Jakub