https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58790
Matthias Kretz <kretz at kde dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|4.9.0 |10.0 --- Comment #2 from Matthias Kretz <kretz at kde dot org> --- Completely different idea how to handle mask reduction and create more potential for optimization: Add a new builtin "__builtin_is_zero(x)" which takes any __vector(N) type and returns true if all bits of x are 0. none_equal(a, b) { return __builtin_is_zero(a == b); } all_equal(a, b) { return __builtin_is_zero(~(a == b)); } any_equal(a, b) { return !__builtin_is_zero(a == b); } some_equal(a, b) { return !__builtin_is_zero(a == b) && !__bulitin_is_zero(~(a == b)) } The x86 backend could then translate those to movmsk or ptest/vtestp[sd]. Examples: with SSE4: __builtin_is_zero(x) -> ptest(x, x); return ZF __builtin_is_zero(~x) -> ptest(x, -1); return CF __builtin_is_zero(integer < 0) -> ptest(integer, signmask); return ZF __builtin_is_zero(x & k) -> ptest(x, k); return ZF __builtin_is_zero(~x & k) -> ptest(x, k); return CF __builtin_is_zero((integer < 0) & k) -> ptest(integer, signmask & k); return ZF without SSE4: __builtin_is_zero(x) -> movmsk(x == 0) == 0 __builtin_is_zero(mask) -> movmsk(mask) == 0 // i.e. when the argument is known // to have only 0 or -1 values __builtin_is_zero(a == b) -> movmsk(a == b) == 0 __builtin_is_zero(~(a == b)) -> movmsk(a == b) == "full bitmask" // 0x3, 0xf, 0xff, 0xffff, or 0xffffffff depending on the actual movmsk instruction used. I assume this would make PR90483 a lot more natural to implement.