As part of improving the vector extract built-ins for the future processor, and also looking at optimizing all of the cases with vector extracts and sign, zero, float extension for GCC 11 (PR target/93230), I noticed that the code generated for some of the tests had regressed from the GCC 8 time frame.
What is happening is in some instances, we want to do a vector extract with a variable element where the vector is in a register: #include <altivec.h> long long foo (vector long long v, unsigned long n) { return vec_extract (v, n); } During the reload pass, the register allocator decides that it should spill the insn to the stack, and then do the vector extract from memory (which is an optimization to prevent loading the vector in case we only want one element). I came to the conclusion that having one insn that did both the variable vector extract on a register and from memory, allowed this bad code to be generated. I split the 3 variable vector extract insn patterns that used input_operand to handle either type, to have one insn that handled a register, and a second insn that handles only memory. Note, there is a 4th place that uses input_operand for variable vector extracts that is not touched by this patch. The insn is trying to combine a variable vector extract with a zero extend operation. That insn pattern never will match due to the way the insn pattern is written. I will be submitting patches to fix this insn separately (PR target/93937), and that patch will split the insn into two separate insns. I have built this code on both a little endian system and a big endian PowerPC system, and I ran make check for C, C++, Fortran, and LTO. There were no regressions after applying the patch for the test suite. Can I check this into the master branch? Because it also fails on the GCC 9 branch, I will be submitting patches for that. Due to the changes between GCC 9 and the current master, the patch in this mail can not be blindly applied, but I have a patch for GCC 9. GCC 8 generates the correct code, so it does not have to be fixed. [gcc] 2020-02-26 Michael Meissner <meiss...@linux.ibm.com> PR target/93932 * config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator): Split the insn into two parts. This insn only does variable extract from a register. (vsx_extract_<mode>_var_load, VSX_D iterator): New insn, do variable extract from memory. (vsx_extract_v4sf_var): Split the insn into two parts. This insn only does variable extract from a register. (vsx_extract_v4sf_var_load): New insn, do variable extract from memory. (vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Split the insn into two parts. This insn only does variable extract from a register. (vsx_extract_<mode>_var_load, VSX_EXTRACT_I iterator): New insn, do variable extract from memory. [gcc/testsuite] 2020-02-26 Michael Meissner <meiss...@linux.ibm.com> PR target/93932 * gcc.target/powerpc/fold-vec-extract-longlong.p8.c: Adjust instruction counts. --- /tmp/TMHdwO_vsx.md 2020-02-26 13:25:27.250209645 -0500 +++ gcc/config/rs6000/vsx.md 2020-02-26 13:25:21.357125563 -0500 @@ -3245,14 +3245,14 @@ (define_insn "vsx_vslo_<mode>" "vslo %0,%1,%2" [(set_attr "type" "vecperm")]) -;; Variable V2DI/V2DF extract +;; Variable V2DI/V2DF extract from a register (define_insn_and_split "vsx_extract_<mode>_var" - [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r") - (unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,Q,Q") - (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] + [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v") + (unspec:<VS_scalar> [(match_operand:VSX_D 1 "gpc_reg_operand" "v") + (match_operand:DI 2 "gpc_reg_operand" "r")] UNSPEC_VSX_EXTRACT)) - (clobber (match_scratch:DI 3 "=r,&b,&b")) - (clobber (match_scratch:V2DI 4 "=&v,X,X"))] + (clobber (match_scratch:DI 3 "=r")) + (clobber (match_scratch:V2DI 4 "=&v"))] "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" @@ -3263,6 +3263,23 @@ (define_insn_and_split "vsx_extract_<mod DONE; }) +;; Variable V2DI/V2DF extract from memory +(define_insn_and_split "*vsx_extract_<mode>_var_load" + [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=wa,r") + (unspec:<VS_scalar> [(match_operand:VSX_D 1 "memory_operand" "Q,Q") + (match_operand:DI 2 "gpc_reg_operand" "r,r")] + UNSPEC_VSX_EXTRACT)) + (clobber (match_scratch:DI 3 "=&b,&b"))] + "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 4))] +{ + operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2], + operands[3], <VS_scalar>mode); +} + [(set_attr "type" "fpload,load")]) + ;; Extract a SF element from V4SF (define_insn_and_split "vsx_extract_v4sf" [(set (match_operand:SF 0 "vsx_register_operand" "=wa") @@ -3315,14 +3332,14 @@ (define_insn_and_split "*vsx_extract_v4s (set_attr "length" "8") (set_attr "isa" "*,p7v,p9v,*")]) -;; Variable V4SF extract +;; Variable V4SF extract from a register (define_insn_and_split "vsx_extract_v4sf_var" - [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r") - (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,Q,Q") - (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] + [(set (match_operand:SF 0 "gpc_reg_operand" "=wa") + (unspec:SF [(match_operand:V4SF 1 "gpc_reg_operand" "v") + (match_operand:DI 2 "gpc_reg_operand" "r")] UNSPEC_VSX_EXTRACT)) - (clobber (match_scratch:DI 3 "=r,&b,&b")) - (clobber (match_scratch:V2DI 4 "=&v,X,X"))] + (clobber (match_scratch:DI 3 "=r")) + (clobber (match_scratch:V2DI 4 "=&v"))] "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" @@ -3333,6 +3350,23 @@ (define_insn_and_split "vsx_extract_v4sf DONE; }) +;; Variable V4SF extract from memory +(define_insn_and_split "*vsx_extract_v4sf_var_load" + [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,?r") + (unspec:SF [(match_operand:V4SF 1 "memory_operand" "Q,Q") + (match_operand:DI 2 "gpc_reg_operand" "r,r")] + UNSPEC_VSX_EXTRACT)) + (clobber (match_scratch:DI 3 "=&b,&b"))] + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 4))] +{ + operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2], + operands[3], SFmode); +} + [(set_attr "type" "fpload,load")]) + ;; Expand the builtin form of xxpermdi to canonical rtl. (define_expand "vsx_xxpermdi_<mode>" [(match_operand:VSX_L 0 "vsx_register_operand") @@ -3677,15 +3711,15 @@ (define_insn_and_split "*vsx_extract_<mo [(set_attr "type" "load") (set_attr "length" "8")]) -;; Variable V16QI/V8HI/V4SI extract +;; Variable V16QI/V8HI/V4SI extract from a register (define_insn_and_split "vsx_extract_<mode>_var" - [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r") + [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r") (unspec:<VS_scalar> - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q") - (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] + [(match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,v") + (match_operand:DI 2 "gpc_reg_operand" "r,r")] UNSPEC_VSX_EXTRACT)) - (clobber (match_scratch:DI 3 "=r,r,&b")) - (clobber (match_scratch:V2DI 4 "=X,&v,X"))] + (clobber (match_scratch:DI 3 "=r,r")) + (clobber (match_scratch:V2DI 4 "=X,&v"))] "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" @@ -3695,7 +3729,25 @@ (define_insn_and_split "vsx_extract_<mod operands[3], operands[4]); DONE; } - [(set_attr "isa" "p9v,*,*")]) + [(set_attr "isa" "p9v,*")]) + +;; Variable V16QI/V8HI/V4SI extract from memory +(define_insn_and_split "*vsx_extract_<mode>_var_load" + [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r") + (unspec:<VS_scalar> + [(match_operand:VSX_EXTRACT_I 1 "memory_operand" "Q") + (match_operand:DI 2 "gpc_reg_operand" "r")] + UNSPEC_VSX_EXTRACT)) + (clobber (match_scratch:DI 3 "=&b"))] + "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 4))] +{ + operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2], + operands[3], <VS_scalar>mode); +} + [(set_attr "type" "load")]) (define_insn_and_split "*vsx_extract_<mode>_<VS_scalar>mode_var" [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r") --- /tmp/oo6tIZ_fold-vec-extract-longlong.p8.c 2020-02-26 14:59:46.104993989 -0500 +++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p8.c 2020-02-26 14:58:19.324752612 -0500 @@ -7,24 +7,23 @@ // Targeting P8LE and P8BE, six tests total. // P8 (LE) constants: mfvsrd -// P8 (LE) variables: addi,xxpermdi,mr,stxvd2x|stxvd4x,rldicl,sldi,ldx,blr -// P8 (BE) constants: mfvsrd -// P8 (BE) Variables: addi,xxpermdi,rldicl,mr,stxvd2x|stxvd4x,sldi,ldx,blr +// P8 (LE) variables: xori, rldic, mtvsrd, xxpermdi, vslo, mfvsrd +// P8 (BE) constants: xxpermdi, mfvsrd +// P8 (BE) Variables: rldic, mtvsrd, xxpermdi, vslo, mfvsrd -/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times {\maddi\M} 3 { target lp64 } } } */ +/* results. */ +/* { dg-final { scan-assembler-times {\mxori\M} 3 { target le } } } */ +/* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {\madd\M} 3 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {\mlwz\M} 11 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\mmfvsrd\M} 6 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */ /* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target le } } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 6 { target { be && lp64 } } } } */ /* { dg-final { scan-assembler-times {\mxxpermdi\M} 2 { target { be && ilp32 } } } } */ -/* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target { be && lp64 } } } } */ -/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 3 { target lp64 } } } */ -/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times {\mrldicl\M|\mrldic\M|\mrlwinm\M} 3 } } */ -/* { dg-final { scan-assembler-times {\mmfvsrd\M} 3 { target { lp64 } } } } */ -/* { dg-final { scan-assembler-times {\mmfvsrd\M} 0 { target { be && ilp32 } } } } */ -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { lp64 } } } } */ -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { be && ilp32 } } } } */ +/* { dg-final { scan-assembler-times {\mvslo\M} 3 { target lp64 } } } */ #include <altivec.h> -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797