Hello All:
This patch replaces fmr instruction 6 cycles with 2 cycles xxlor instruction
for p7 and p8 architecture.
I have implemented with switch and cases otherwise it is difficult to
accommodate
xxlor with p7 and p8 and fmr for other architectures.
Bootstrapped and regtested.
Thanks & Regards
Ajit
rs6000: fmr gets used instead of faster xxlor [PR93571]
This patch replaces 6 cycles fmr instruction with xxlor
2 cycles in p8 and p7 architecture.
2023-02-21 Ajit Kumar Agarwal <[email protected]>
gcc/ChangeLog:
* config/rs6000/rs6000.md (*movdf_hardfloat64): Replace fmr with xxlor
instruction.
---
gcc/config/rs6000/rs6000.md | 49 ++++++++++++++++++++++---------------
1 file changed, 29 insertions(+), 20 deletions(-)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index dfd6c73ffcb..ef587033367 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8433,26 +8433,35 @@ (define_insn "*mov<mode>_hardfloat64"
"TARGET_POWERPC64 && TARGET_HARD_FLOAT
&& (gpc_reg_operand (operands[0], <MODE>mode)
|| gpc_reg_operand (operands[1], <MODE>mode))"
- "@
- stfd%U0%X0 %1,%0
- lfd%U1%X1 %0,%1
- xxlor %0,%1,%1
- lxsd %0,%1
- stxsd %1,%0
- lxsdx %x0,%y1
- stxsdx %x1,%y0
- xxlor %x0,%x1,%x1
- xxlxor %x0,%x0,%x0
- li %0,0
- std%U0%X0 %1,%0
- ld%U1%X1 %0,%1
- mr %0,%1
- mt%0 %1
- mf%1 %0
- nop
- mfvsrd %0,%x1
- mtvsrd %x0,%1
- #"
+{
+ switch (which_alternative) {
+ case 0 : return "stfd%U0%X0 %1,%0";
+ case 1 : return "lfd%U1%X1 %0,%1";
+ case 2 : if ((TARGET_VSX || TARGET_P8_VECTOR)
+ && !TARGET_P9_VECTOR
+ && !TARGET_POWER10)
+ return "xxlor %0,%1,%1";
+ else
+ return "fmr %0,%1";
+
+ case 3 : return "lxsd %0,%1";
+ case 4 : return "stxsd %1,%0";
+ case 5 : return "lxsdx %x0,%y1";
+ case 6 : return "stxsdx %x1,%y0";
+ case 7 : return "xxlor %x0,%x1,%x1";
+ case 8 : return "xxlxor %x0,%x0,%x0";
+ case 9 : return "li %0,0";
+ case 10 : return "std%U0%X0 %1,%0";
+ case 11 : return "ld%U1%X1 %0,%1";
+ case 12 : return "mr %0,%1";
+ case 13 : return "mt%0 %1";
+ case 14 : return "mf%1 %0";
+ case 15 : return "nop";
+ case 16: return "mfvsrd %0,%x1";
+ case 17 : return "mtvsrd %x0,%1";
+ }
+ return "unreachable";
+}
[(set_attr "type"
"fpstore, fpload, fpsimple, fpload, fpstore,
fpload, fpstore, veclogical, veclogical, integer,
--
2.31.1
On 17/02/23 10:53 pm, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Feb 17, 2023 at 10:28:41PM +0530, Ajit Agarwal wrote:
>> This patch replaces fmr instruction (6 cycles) with xxlor instruction ( 2
>> cycles)
>> Bootstrapped and regtested on powerpc64-linux-gnu.
>
> You tested this on a CPU that does have VSX. It is incorrect on other
> (older) CPUs.
>
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -8436,7 +8436,7 @@
>> "@
>> stfd%U0%X0 %1,%0
>> lfd%U1%X1 %0,%1
>> - fmr %0,%1
>> + xxlor %0,%1,%1
>> lxsd %0,%1
>> stxsd %1,%0
>> lxsdx %x0,%y1
>
> This is the *mov<mode>_hardfloat64 pattern. You can add some magic to
> your Git config so that will show in the patch: in .git/config:
>
> [diff "md"]
> xfuncname = "^\\(define.*$"
>
> (As it says in .gitattributes:
> # Make diff on MD files use "(define" as a function marker.
> # Use together with git config diff.md.xfuncname '^\(define.*$'
> # which is run by contrib/gcc-git-customization.sh too.
> *.md diff=md
> )
>
> The third alternative to this insn, the fmr one, has "d" as both input
> and output constraint, and has "*" as isa attribute, so it will be used
> on any CPU that has floating point registers. The eight alternative
> (the existing xxlor one) has "wa" constraints (via <f64_vsx>) so it
> implicitly requires VSX to be enabled. You need to do something similar
> for what you want, but you also need to still allow fmr.
>
>
> Segher