https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796
--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> --- In GCC 5-8 we produced: vpcmpeqd %ymm2, %ymm2, %ymm2 vpsllq $63, %ymm2, %ymm2 vandnpd %ymm1, %ymm2, %ymm1 vandpd %ymm2, %ymm0, %ymm0 vorpd %ymm1, %ymm0, %ymm0 In GCC 9+ is when we start to produce the constant load. On the trunk we produce vbroadcastsd/load instead of the full load of the vector. Note clang/LLVM do a poor job too: vbroadcastsd .LCPI0_0(%rip), %ymm2 # ymm2 = upperbit set vandps %ymm2, %ymm0, %ymm0 vbroadcastsd .LCPI0_1(%rip), %ymm2 # ymm2 = everybit except upper bit set vandps %ymm2, %ymm1, %ymm1 vorps %ymm0, %ymm1, %ymm0 clang produces two vbroadcastsd instead of one and a vandnpd. ICC produces what GCC used to produce before GCC 9. MSVC produces almost what GCC used to produce except uses a load for the -1 vector (that itself is a regression for them, they used to produce vpcmpeqd for -1).