https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
In GCC 5-8 we produced:
        vpcmpeqd        %ymm2, %ymm2, %ymm2
        vpsllq  $63, %ymm2, %ymm2
        vandnpd %ymm1, %ymm2, %ymm1
        vandpd  %ymm2, %ymm0, %ymm0
        vorpd   %ymm1, %ymm0, %ymm0

In GCC 9+ is when we start to produce the constant load.
On the trunk we produce vbroadcastsd/load instead of the full load of the
vector.

Note clang/LLVM do a poor job too:

        vbroadcastsd    .LCPI0_0(%rip), %ymm2   # ymm2 = upperbit set
        vandps  %ymm2, %ymm0, %ymm0
        vbroadcastsd    .LCPI0_1(%rip), %ymm2   # ymm2 = everybit except upper
bit set
        vandps  %ymm2, %ymm1, %ymm1
        vorps   %ymm0, %ymm1, %ymm0

clang produces two vbroadcastsd instead of one and a vandnpd.
ICC produces what GCC used to produce before GCC 9.

MSVC produces almost what GCC used to produce except uses a load for the -1
vector (that itself is a regression for them, they used to produce vpcmpeqd for
-1).

Reply via email to