Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > Hi, > The attached patch tries to improve initialization for fixed-length > SVE vector and it's algorithm is described in comments for > aarch64_sve_expand_vector_init() in the patch, with help from Richard > Sandiford. I verified tests added in the patch pass with qemu and am > trying to run bootstrap+test on patch in qemu. > Does the patch look OK ? > > Thanks, > Prathamesh > > 2019-05-27 Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> > Richard Sandiford <richard.sandif...@arm.com>
Although we iterated on ideas for the patch a bit, I didn't write any of it, so the changelog should just have your name. > [...] > @@ -3207,3 +3207,15 @@ > DONE; > } > ) > + > +;; Standard pattern name vec_init<mode><Vel>. > + > +(define_expand "vec_init<mode><Vel>" The rest of the file doesn't have blank lines after the comment. > +/* Subroutine of aarch64_sve_expand_vector_init for handling > + trailing constants. > + This function works as follows: > + (a) Create a new vector consisting of trailing constants. > + (b) Initialize TARGET with the constant vector using emit_move_insn. > + (c) Insert remaining elements in TARGET using insr. > + NELTS is the total number of elements in original vector while > + truncated sentence, guess the rest would have been: NELTS_REQD is the number of elements that are actually significant. or something. > + ??? The heuristic used is to do above only if number of constants > + is at least half the total number of elements. May need fine tuning. */ > + > +static bool > +aarch64_sve_expand_vector_init_handle_trailing_constants > + (rtx target, const rtx_vector_builder &builder, int nelts, int nelts_reqd) > +{ > + machine_mode mode = GET_MODE (target); > + scalar_mode elem_mode = GET_MODE_INNER (mode); > + int n_trailing_constants = 0; > + > + for (int i = nelts_reqd - 1; > + i >= 0 && aarch64_legitimate_constant_p (elem_mode, builder.elt (i)); > + i--) > + n_trailing_constants++; > + > + if (n_trailing_constants >= nelts_reqd / 2) > + { > + rtx_vector_builder v (mode, 1, nelts); > + for (int i = 0; i < nelts; i++) > + v.quick_push (builder.elt (i + nelts_reqd - n_trailing_constants)); > + rtx const_vec = v.build (); > + emit_move_insn (target, const_vec); > + > + for (int i = nelts_reqd - n_trailing_constants - 1; i >= 0; i--) > + emit_insr (target, builder.elt (i)); > + > + return true; > + } > + > + return false; > +} > + > +/* Subroutine of aarch64_sve_expand_vector_init. > + Works as follows: > + (a) Initialize TARGET by broadcasting element NELTS_REQD - 1 of BUILDER. > + (b) Skip trailing elements from BUILDER, which are same as s/are same/are the same/ > + element NELTS_REQD - 1. > + (c) Insert earlier elements in reverse order in TARGET using insr. */ > + > +static void > +aarch64_sve_expand_vector_init_insert_elems (rtx target, > + const rtx_vector_builder &builder, > + int nelts_reqd) > +{ > + machine_mode mode = GET_MODE (target); > + scalar_mode elem_mode = GET_MODE_INNER (mode); > + > + struct expand_operand ops[2]; > + enum insn_code icode = optab_handler (vec_duplicate_optab, mode); > + gcc_assert (icode != CODE_FOR_nothing); > + > + create_output_operand (&ops[0], target, mode); > + create_input_operand (&ops[1], builder.elt (nelts_reqd - 1), elem_mode); > + expand_insn (icode, 2, ops); > + > + int ndups = builder.count_dups (nelts_reqd - 1, -1, -1); > + for (int i = nelts_reqd - ndups - 1; i >= 0; i--) > + emit_insr (target, builder.elt (i)); > +} > + > +/* Subroutine of aarch64_sve_expand_vector_init to handle case > + when all trailing elements of builder are same. > + This works as follows: > + (a) Using expand_insn interface to broadcast last vector element in > TARGET. s/Using/Use/ > + (b) Insert remaining elements in TARGET using insr. > + > + ??? The heuristic used is to do above if number of same trailing elements > + is at least 3/4 of total number of elements, loosely based on > + heuristic from mostly_zeros_p. May need fine-tuning. */ Should be two spaces before "May". > [...] > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_1.c > b/gcc/testsuite/gcc.target/aarch64/sve/init_1.c > new file mode 100644 > index 00000000000..c51876947fb > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/init_1.c > @@ -0,0 +1,27 @@ > +/* { dg-do compile { target aarch64_asm_sve_ok } } */ > +/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns > -msve-vector-bits=256 --save-temps" } */ These tests shouldn't require -ftree-vectorize. > [...] > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c > b/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c > new file mode 100644 > index 00000000000..d9640e42ddd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c > @@ -0,0 +1,21 @@ > +/* { dg-do run { target aarch64_sve256_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } > */ No need for --save-temps in the run tests. > [...] > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_11.c > b/gcc/testsuite/gcc.target/aarch64/sve/init_11.c > new file mode 100644 > index 00000000000..b90895df436 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/init_11.c > @@ -0,0 +1,27 @@ > +/* { dg-do compile { target aarch64_asm_sve_ok } } */ > +/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns > -msve-vector-bits=256 --save-temps" } */ > + > +/* Case 5.5: Interleaved repeating elements and trailing same elements. */ > + > +#include <stdint.h> > + > +typedef int32_t vnx4si __attribute__((vector_size (32))); > + > +vnx4si foo(int a, int b, int f) > +{ > + return (vnx4si) { a, f, b, f, b, f, b, f }; > +} > + This is missing __attribute__ ((noipa)). Same for some other tests. Thanks, Richard