https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401
--- Comment #6 from andysem at mail dot ru --- (In reply to Andrew Pinski from comment #1) > >and gcc 12 generates a worse code: > > it is not worse really; depending on the how fast moving between the > register sets is. I meant "worse" compared to vpcmpeq+vpsrlw pair. (Side note about the broadcast version: it could have been smaller if it used a 32-bit constant and vpbroadcastd. vpcmpeq+vpsrlw would still be better in this particular case, but if broadcast is needed, a smaller footprint code is preferred.)