https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117251

--- Comment #10 from Michael Meissner <meissner at gcc dot gnu.org> ---
There is an instruction that was added in power10 (XXEVAL) that does provide
fusion between VSX vectors that includes ANDC->XOR and XOR->XOR fusion.  I have
coded up patches to support this and I will be submitting these patches
shortly.

      XXEVAL  Trunk   GCC14   GCC13   GCC12   GCC11
      ------  -----   -----   -----   -----   -----
-O3:    5.53   6.15    6.28    5.57    5.61    9.56

The latency of XXEVAL is slightly more than the fused VANDC/VXOR or VXOR/VXOR,
so I have written the patch to prefer doing the Altivec instructions if they
don't need a temporary register.

                                        XXEVAL Trunk   GCC14   GCC13   GCC12
                                        ------ -----   -----   -----   -----
Fuse VANDC -> VXOR                         209   600     600     600     600
Fuse VXOR -> VXOR                          ---   240     240     120     120
XXEVAL to fuse ANDC -> XOR                 391   ---     ---     ---     ---
XXEVAL to fuse XOR -> XOR                  240   ---     ---     ---     ---

Spill vector to stack                       78   364     364     172     184
Load spilled vector from stack             431   962     962     713     723
Vector moves                                10   100     100      70      72

Vector rotate right                        696   696     696     696     696
XXLANDC or VANDC                           209   600     600     600     600
XXLXOR or VXOR                             953 1,824   1,824   1,824   1,824
XXEVAL                                     631   ---     ---     ---     ---

XXSPLTIB and VEXTSB2D to load constants     24    24      24      24      24

Reply via email to