Hello All: This patch replaces fmr instruction 6 cycles with 2 cycles xxlor instruction for p7 and p8 architecture.
I have implemented with switch and cases otherwise it is difficult to accommodate xxlor with p7 and p8 and fmr for other architectures. Bootstrapped and regtested. Thanks & Regards Ajit rs6000: fmr gets used instead of faster xxlor [PR93571] This patch replaces 6 cycles fmr instruction with xxlor 2 cycles in p8 and p7 architecture. 2023-02-21 Ajit Kumar Agarwal <aagar...@linux.ibm.com> gcc/ChangeLog: * config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor instruction. --- gcc/config/rs6000/rs6000.md | 49 ++++++++++++++++++++++--------------- 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index dfd6c73ffcb..ef587033367 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -8433,26 +8433,35 @@ (define_insn "*mov<mode>_hardfloat64" "TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], <MODE>mode) || gpc_reg_operand (operands[1], <MODE>mode))" - "@ - stfd%U0%X0 %1,%0 - lfd%U1%X1 %0,%1 - xxlor %0,%1,%1 - lxsd %0,%1 - stxsd %1,%0 - lxsdx %x0,%y1 - stxsdx %x1,%y0 - xxlor %x0,%x1,%x1 - xxlxor %x0,%x0,%x0 - li %0,0 - std%U0%X0 %1,%0 - ld%U1%X1 %0,%1 - mr %0,%1 - mt%0 %1 - mf%1 %0 - nop - mfvsrd %0,%x1 - mtvsrd %x0,%1 - #" +{ + switch (which_alternative) { + case 0 : return "stfd%U0%X0 %1,%0"; + case 1 : return "lfd%U1%X1 %0,%1"; + case 2 : if ((TARGET_VSX || TARGET_P8_VECTOR) + && !TARGET_P9_VECTOR + && !TARGET_POWER10) + return "xxlor %0,%1,%1"; + else + return "fmr %0,%1"; + + case 3 : return "lxsd %0,%1"; + case 4 : return "stxsd %1,%0"; + case 5 : return "lxsdx %x0,%y1"; + case 6 : return "stxsdx %x1,%y0"; + case 7 : return "xxlor %x0,%x1,%x1"; + case 8 : return "xxlxor %x0,%x0,%x0"; + case 9 : return "li %0,0"; + case 10 : return "std%U0%X0 %1,%0"; + case 11 : return "ld%U1%X1 %0,%1"; + case 12 : return "mr %0,%1"; + case 13 : return "mt%0 %1"; + case 14 : return "mf%1 %0"; + case 15 : return "nop"; + case 16: return "mfvsrd %0,%x1"; + case 17 : return "mtvsrd %x0,%1"; + } + return "unreachable"; +} [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, integer, -- 2.31.1 On 17/02/23 10:53 pm, Segher Boessenkool wrote: > Hi! > > On Fri, Feb 17, 2023 at 10:28:41PM +0530, Ajit Agarwal wrote: >> This patch replaces fmr instruction (6 cycles) with xxlor instruction ( 2 >> cycles) >> Bootstrapped and regtested on powerpc64-linux-gnu. > > You tested this on a CPU that does have VSX. It is incorrect on other > (older) CPUs. > >> --- a/gcc/config/rs6000/rs6000.md >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -8436,7 +8436,7 @@ >> "@ >> stfd%U0%X0 %1,%0 >> lfd%U1%X1 %0,%1 >> - fmr %0,%1 >> + xxlor %0,%1,%1 >> lxsd %0,%1 >> stxsd %1,%0 >> lxsdx %x0,%y1 > > This is the *mov<mode>_hardfloat64 pattern. You can add some magic to > your Git config so that will show in the patch: in .git/config: > > [diff "md"] > xfuncname = "^\\(define.*$" > > (As it says in .gitattributes: > # Make diff on MD files use "(define" as a function marker. > # Use together with git config diff.md.xfuncname '^\(define.*$' > # which is run by contrib/gcc-git-customization.sh too. > *.md diff=md > ) > > The third alternative to this insn, the fmr one, has "d" as both input > and output constraint, and has "*" as isa attribute, so it will be used > on any CPU that has floating point registers. The eight alternative > (the existing xxlor one) has "wa" constraints (via <f64_vsx>) so it > implicitly requires VSX to be enabled. You need to do something similar > for what you want, but you also need to still allow fmr. > > > Segher