Hi Mike,
On Wed, Nov 08, 2017 at 08:14:31AM -0500, Michael Meissner wrote:
> PowerPC ISA 3.0 does not have a byte-reverse instruction that operates on the
> GPRs, but it does have vector byte swap half-word, word, double-word
> operations
> in the VSX registers. The enclosed patch enables generation of the byte
> revseral instructions for register-register operations. It still prefers to
> generate the load with byte reverse (L{H,W,D}BRX) or store with byte reverse
> (ST{H,W,D}BRX) instructions over the register sequence.
>
> For 16-bit and 32-bit byte swaps, it typically does the tradational operation
> in GPR registers, but it will generate XXBR{H,W} if the values are in vector
> registers.
You can do the byteswaps with just VMX, too, with some rotate instructions
(vrlh 8 for HI, vrlh 8 followed by vrlw 16 for SI, etc.) For a future
date, I suppose.
> For 64-bit swaps, it no longer generates the 9 instruction sequence in favor
> of
> XXBRD. I did some timing runs on a prototype power9 system, and it was
> slightly faster to do direct move to the vecter unit, XXBRD, and direct move
> back to a GPR than the traditional sequence.
>
> I did bootstraps on little endian Power8 and Power9 systems (with the default
> cpu set to power8 and power9 respectively). There were no regressions. Can I
> check this patch into the trunk?
>
> [gcc]
> 2017-11-08 Michael Meissner <[email protected]>
>
> * config/rs6000/rs6000.md (bswaphi2_reg): On ISA 3.0 systems,
> enable generating XXBR{H,W} if the value is in a vector
> register.
Join the last two lines? And you only mean XXBRH here.
> (bswapsi2_reg): Likewise.
And XXBRW here.
> (bswapdi2_reg): On ISA 3.0 systems, use XXBRD to do bswap64
> instead of doing the GPR sequence used on previoius machines.
Typo ("previous").
> (bswapdi2_xxbrd): Likewise.
Not "Likewise"; just "New define_insn."
Should this somehow be joined with p9_xxbrd_<mode>? Or maybe you should
generate that, instead.
> (bswapdi2_reg splitters): Use int_reg_operand instead of
> gpc_reg_operand to not match when XXBRD is generated.
Please see if you can merge the define_splits to their corresponding
define_insns (making define_insn_and_split). Shorter, no mismatch
possible between the two anymore, and you get a name for the patterns
too :-)
> @@ -2507,6 +2513,8 @@ (define_expand "bswapdi2"
> emit_insn (gen_bswapdi2_load (dest, src));
> else if (MEM_P (dest))
> emit_insn (gen_bswapdi2_store (dest, src));
> + else if (TARGET_P9_VECTOR)
> + emit_insn (gen_bswapdi2_xxbrd (dest, src));
> else
> emit_insn (gen_bswapdi2_reg (dest, src));
> DONE;
Pity that you cannot easily do similar for HI and SI.
> @@ -2544,7 +2559,8 @@ (define_insn "bswapdi2_reg"
> (clobber (match_scratch:DI 3 "=&r"))]
> "TARGET_POWERPC64 && TARGET_LDBRX"
> "#"
> - [(set_attr "length" "36")])
> + [(set_attr "length" "36")
> + (set_attr "type" "*")])
Explicitly setting "type" (or any other attr) to the default value is
pretty useless; just leave it out.
Segher