Ah, yes - those test results should have only been changed for little endian.
I've submitted a patch to the list restoring the original expected results for big endian. Thanks, Jonathan ________________________________ From: Christophe Lyon <christophe.lyon....@gmail.com> Sent: 15 July 2021 10:09 To: Richard Sandiford <richard.sandif...@arm.com>; Jonathan Wright <jonathan.wri...@arm.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov <kyrylo.tkac...@arm.com> Subject: Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote: Jonathan Wright <jonathan.wri...@arm.com<mailto:jonathan.wri...@arm.com>> writes: > Hi, > > Version 2 of this patch adds more code generation tests to show the > benefit of this RTL simplification as well as adding a new helper function > 'rtx_vec_series_p' to reduce code duplication. > > Patch tested as version 1 - ok for master? Sorry for the slow reply. > Regression tested and bootstrapped on aarch64-none-linux-gnu, > x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and > aarch64_be-none-linux-gnu - no issues. I've also tested this on powerpc64le-unknown-linux-gnu, no issues again. > diff --git a/gcc/combine.c b/gcc/combine.c > index > 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4 > 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, > int in_dest, > - 1, > 0)); > break; > + case VEC_SELECT: > + { > + rtx trueop0 = XEXP (x, 0); > + mode = GET_MODE (trueop0); > + rtx trueop1 = XEXP (x, 1); > + int nunits; > + /* If we select a low-part subreg, return that. */ > + if (GET_MODE_NUNITS (mode).is_constant (&nunits) > + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS)) > + { > + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0; > + > + if (rtx_vec_series_p (trueop1, offset)) > + { > + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); > + if (new_rtx != NULL_RTX) > + return new_rtx; > + } > + } > + } Since this occurs three times, I think it would be worth having a new predicate: /* Return true if, for all OP of mode OP_MODE: (vec_select:RESULT_MODE OP SEL) is equivalent to the lowpart RESULT_MODE of OP. */ bool vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel) containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class and rtx_vec_series_p tests. I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p is in emit-rtl.c. > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index > aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 > 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -1884,15 +1884,16 @@ > ) > > (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64" > - [(set (match_operand:GPI 0 "register_operand" "=r,r,w") > - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" > "r,m,m")))] > + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r") > + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" > "r,m,m,w")))] > "" > "@ > and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask> > ldr<SHORT:size>\t%w0, %1 > - ldr\t%<SHORT:size>0, %1" > - [(set_attr "type" "logic_imm,load_4,f_loads") > - (set_attr "arch" "*,*,fp")] > + ldr\t%<SHORT:size>0, %1 > + umov\t%w0, %1.<SHORT:size>[0]" > + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp") > + (set_attr "arch" "*,*,fp,fp")] FTR (just to show I thought about it): I don't know whether the umov can really be considered an fp operation rather than a simd operation, but since we don't support fp without simd, this is already a distinction without a difference. So the pattern is IMO OK as-is. > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > index > 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 > 100644 > --- a/gcc/config/arm/vfp.md > +++ b/gcc/config/arm/vfp.md > @@ -224,7 +224,7 @@ > ;; problems because small constants get converted into adds. > (define_insn "*arm_movsi_vfp" > [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m > ,*t,r,*t,*t, *Uv") > - (match_operand:SI 1 "general_operand" "rk, > I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))] > + (match_operand:SI 1 "general_operand" "rk, > I,K,j,mi,rk,r,t,*t,*Uvi,*t"))] > "TARGET_ARM && TARGET_HARD_FLOAT > && ( s_register_operand (operands[0], SImode) > || s_register_operand (operands[1], SImode))" I'll assume that an Arm maintainer would have spoken up by now if they didn't want this for some reason. > diff --git a/gcc/rtl.c b/gcc/rtl.c > index > aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa > 100644 > --- a/gcc/rtl.c > +++ b/gcc/rtl.c > @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec) > } > } > > +/* Return true if element-selection indices in VEC are in series. */ > + > +bool > +rtx_vec_series_p (const_rtx vec, int start) I think rtvec_series_p would be better, for consistency with rtvec_all_equal_p. Also, let's generalise it to: /* Return true if VEC contains a linear series of integers { START, START+1, START+2, ... }. */ bool rtvec_series_p (rtvec vec, int start) { } > +{ > + for (int i = 0; i < XVECLEN (vec, 0); i++) > + { > + if (i + start != INTVAL (XVECEXP (vec, 0, i))) > + return false; > + } > + return true; With the general definition I think this should be: for (int i = 0; i < GET_NUM_ELEM (vec); i++) { rtx x = RTVEC_ELT (vec, i); if (!CONST_INT_P (x) || INTVAL (x) != i + start) return false; } Then pass XVEC (sel, 0) to the function, instead of just sel. OK with those changes, thanks. Hi, Some of the updated tests fail on aarch64_be: gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2 gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2 gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2 gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2 gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 5 gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 5 gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 6 gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 6 Can you check? Thanks, Christophe Richard