https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117487

            Bug ID: 117487
           Summary: Power8 optimizations for math library aren't done in
                    power9 or power10 (PR target/71977)
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

I was answering an email about something else, and I wanted to look up code
that I added in January 4th, 2017 (PR target/71977, PR target/70568, PR
target/78823).  I noticed while this code is optimized on power8, it is not
optimized on power9 or power10.

The code (gcc.target/pr71977-1.c) is:

#include <stdint.h>

typedef union
{
  float value;
  uint32_t word;
} ieee_float_shape_type;

float
mask_and_float_var (float f, uint32_t mask)
{ 
  ieee_float_shape_type u;

  u.value = f;
  u.word &= mask;

  return u.value;
}

The initial code generated before the January 4th, 2017 changes was:

        xscvdpspn 0,1
        mfvsrwz 9,0
        and 9,9,4
        sldi 9,9,32
        mtvsrd 1,9
        xscvspdpn 1,1
        blr

Note, there is a direct move from the FPR/vector registers, the logical
operation is done in the GPR registers and then a direct move back to the
FPR/vector registers.

After the changes, the code for power8 is:

        xscvdpspn 0,1
        sldi 9,4,32
        mtvsrd 32,9
        xxland 1,0,32
        xscvspdpn 1,1
        blr

In this case, we avoid a direct register move from the FPR/vector registers to
the GPR registers, and we do the logical operation in the vector registers.

If we look at the power10/power9 code, it is:

        xscvdpspn 0,1
        mfvsrwz 2,0
        and 2,2,4
        mtvsrws 1,2
        xscvspdpn 1,1
        blr

I.e. we do 2 direct moves between the GPR registers and the FPR/vector
registers and do the logical operation in the GPR registers.

The reason for this is we have the MTVSRWS instruction in power9/power10 (splat
bottom 32-bits of a GPR register into a FPR register).  In the power8 case, we
don't have MTVSRWS, so instead we need to do a shift left 32-bits (SLDI) and
then direct move to the FPR/vector registers before we can do XSCVSPDPN.

The XSCVSPDPN instruction wants the value in the upper 32-bits.  We do this
either by a left shift or by a splat operation.

To fix this, we would need a similar define_peephole2 to the one around line
6318 of vsx.md that matches using the splat operation instead of a shift and
64-bit move.

Reply via email to