https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118994
--- Comment #5 from John Platts <john_platts at hotmail dot com> --- GCC also fails to optimize (a | b) - ((a ^ b) >> 1) down to a single SSE2 PAVGB/PAVGW, NEON/SVE2 SRHADD/URHADD, AltiVec vavgsb/vavgsh/vavgsw/vavgub/vavguh/vavguw instruction where supported on the target, but Clang will optimize (a | b) - ((a ^ b) >> 1) down to PAVGB/PAVGW/SRHADD/URHADD where available on the target according to a snippet over at https://godbolt.org/z/Yz8fEW46f.